Polypeptide regulation by conditional inteins

ABSTRACT

The present invention relates to methods and reagents for the regulation of a target polypeptide bioactivity by controlled self-excision of an intein.

1. BACKGROUND OF THE INVENTION

[0001] The polypeptide products of genes carry a wide assortment ofbioactivities which effect most of the processes required for lifeincluding enzymatic functions, structural functions and the vastmajority of biological control functions. Manipulation of thesefunctions for experimental, agricultural or pharmaceutical purposesgenerally requires polypeptide-specific agonists or antagonists which,respectively, increase or decrease the particular bioactivity ofinterest. The rational design of small molecule agonist and antagonistligands is advancing with new strides in the ability to predict targetprotein structure as well as with advances in combinatorial chemicalsynthesis and high through-put screening methodology. Nevertheless, agenerally applicable method for controlling the biological activity of apreexisting polypeptide would obviate the need to identify novel andspecific polypeptide agonists and antagonists as new biologicallyimportant target proteins are uncovered. Furthermore, potentialunintended side-effects of a novel polypeptide agonist or antagonistwould be prevented with a general method which is responsive to a knownbiological signal with predictable effects. Conditional mutationsprovide a means of regulating a particular target polypeptide inresponse to a particular regulatory signal. For example,temperature-sensitive conditional mutants are responsive to changes intemperature and generally evince reduced bioactivity at a particulartemperature, the nonpermissive temperature, which is higher than that ofthe permissive temperature, at which bioactivity is greater. In contrastcold-sensitive mutants generally evince reduced bioactivity at anonpermissive temperature which is lower than that of the permissivetemperature. The use of such “conditional” mutants is particularlyadvantageous when studying the function of polypeptides which are“essential” for life—i.e. those polypeptides which encode a bioactivitywhich is essential for cell survival. Temperature sensitive mutations ina gene are generally isolated by means of extensive genetic screeningfor particular missense mutations in the target gene which render theencoded polypeptide thermolabile.

[0002] The heat-inducible N-degron module (U.S. Pat. No. 5,705,387) is apolypeptide structure which, when genetically engineered onto theamino-terminus of a target polypeptide, renders the target polypeptidethermolabile via a mechanism which involves N-end rule dependentproteolysis. Notably, this system results in the rapid degradation ofthe target polypeptide in the repressed state and so reactivation of thetarget requires new protein synthesis.

2. SUMMARY OF THE INVENTION

[0003] The present invention contemplates a general method forcontrolling a target polypeptide bioactivity by engineering the targetprotein with an inactivating polypeptide insert which can be regulatablyexcised from the target protein to yield native, biologically activeprotein in a controlled manner. In preferred embodiments of theinvention, the inactivating polypeptide insert employed is a regulatableintein which is introduced into the host protein by genetic engineeringof the host polypetide encoding gene. Inteins are protein-splicingelements that exist as in-frame fusions with flanking protein sequencescalled exteins. Naturally occurring inteins are appear to constitutivelyself-splice at the protein level, with their excision being coupled toextein ligation (see e.g. Cooper et al. (1995) TIBS 20: 351-56). Atleast some inteins encode an endonuclease activity which, once theintein has auto-excised from the host protein, can act to mediate themovement of the insertional element to new sites in the host organism'sgenome (Cooper et al. (1993) BioEssays 15: 667-73). Inteins arephylogenetically widespread, occurring in all three biologicalkingdoms—eubacteria, archaebacteria and eukaryotes. The terms extein andintein, as used herein, refer to both the genetic material andcorresponding protein products.

[0004] The self-splicing mechanism of inteins has been wellcharacterized and is known to one of ordinary skill in the art. TheIntein Database at http://www.neb.com/ neb/inteins/html sets forth thegeneral mechanism in detail. Without wishing to be bound to any theory,we set forth the mechanism as known in the art. In general, proteinsplicing involves four nucleophilic displacements by the 3 conservedsplice junction residues. The conserved histidine residue present in theC1 block of the intein assists in Asparagine cyclization and C-terminalcleavage (Xu et al. (1996) EMBO 15(19):5146-5153) by hydrogen bonding tothe Asparagine carbonyl oxygen, making this peptide bond more labile.The Threonine and Histidine in conserved block N3 assist in the initialacyl rearrangement at the N-terminal splice junction by hydrogen bondingto main chain atoms and holding the residue preceding the intein in anon-standard cis conformation. Any residue that can form similarhydrogen bonds can substitute for these conserved facilitating residuesin Blocks N3 and C1. The mechanism of protein splicing has recently beenreviewed by Perler et al. (1997) Nuc. Acids Res. 25:1087-93 and Shao etal. (1997) Chem. & Biol. 4:187-194. Since this mechanism is welldocumented in the art designing inteins which retain the self-splicingactivity is considered to be well within the purview of the skilledartisan.

[0005] Regulation of the “target polypeptides” on-demand by the methodof the present invention is achieved by introducing regulatable proteinintrons or inteins into the target polypeptides by methods known to theskilled artisan such as homologous recombination. Inteins are a group ofrelated protein elements that are found within a range of host proteinsimmediately after their translation. Proteins containing the embeddedinteins are non-functional. After translation the inteinauto-catalytically splices itself out resulting in a functional hostprotein and an autonomous intein. Regulation of the self-splicingmechanism so that the self-splicing occurs on demand results in aprocess which will provide the host or target protein “on-demand”.

[0006] In particular, the self-splicing activity may be agonized orantagonized in response to a signal. Such signals include but are notlimited to various internal and external factors including an increaseor decrease in temperature, pH, exposure to light, unblocking of aminoacid residues by dephosphorylation or deglycosylation, ionicconcentrations, concentration of various metals, osmolarity, and/or thepresence or absence of certain exogenous chemical agents such as variouschemical dimerizer agents inducing rapamycin and related agents such asAP1510. Examples of exogenous chemicals include agents such as rapamycinor rapamycin analogs useful in mammalian systems and chemicals such assalicylic acid, abscissic acid useful in plant systems. Regulation ofself-splicing of an engineered polypeptide at will via a regulatingintermediate that could be easily supplied exogenously is particularlyadvantageous. This allows the production of the functional polypeptideas a function of the exogenously supplied chemical compound.

[0007] This allows control of the formation of the functional targetpolypeptide so that it is formed only at the appropriate time and to theappropriate extent, and in some situations in particular parts of theliving system. In view of considerations like these, as well as others,it is clear that control of the time, extent and/or site of expressionof the chimeric gene in plants or plant tissues would be highlydesirable. Control that could be exercised easily would be of particularcommercial value.

[0008] Other features and advantages of the invention will be apparentfrom the following detailed description and claims.

3. BRIEF DESCRIPTION OF THE FIGURES

[0009]FIG. 1 shows an intein splicing mechanism.

[0010]FIG. 2 shows the genetic modification of a generalized target genewith a regulatable intein, resulting in regulation of the encodedpolypeptide bioactivity by controlled intein excision.

[0011]FIG. 3 shows the regulation of a polypeptide bioactivity by meansof controlled intein trans-splicing with an organic dimerizer drug.

[0012]FIG. 4 shows the amino acid sequence of the yeast Sce intein andthe positions location of allelic changes in conditional mutants.Conserved intein sequence motifs are underlined and numbering isrelative to the first amino acid of the intein sequence. The positionsof amino acid changes resulting in conditional temperature sensitive(TS) or cold sensitive (CS) mutations are shown as subscripts and theprecise amino acid changes are indicated below the sequence where thefirst letter indicates the single letter designation of the intein aminooccurring at the amino acid position designated by the number and thesecond letter indicates the identity of the substituted amino acid inthe mutant. Conditional mutants associated with a single amino acidchange are indicated as upper case TS and CS alleles while thoseassociated with more than one alteration are indicated as lower case tsand cs alleles.

[0013]FIG. 5 shows the nucleic acid and amino acid sequence of theSaccharomyces cerevisiae VMA intein-containing TFP1-480 gene (GenBankAccession No. M21609). Numbering of the nucleotide sequence is inaccordance with the GenBank entry and the intein-encoding nucleic acidsequence is underlined.

[0014]FIG. 6 shows the nucleic acid and amino acid sequence of theCandida tropicalis VMA intein-containing gene (GenBank Accession No.M64984). Numbering of the nucleotide sequence is in accordance with theGenBank entry and the intein-encoding nucleic acid sequence isunderlined.

[0015]FIG. 7 shows the nucleic acid and amino acid sequence of theChlamydomonas eugamentos clpP intein-containing gene (GenBank AccessionNo. L29402). Numbering of the nucleotide sequence is in accordance withthe GenBank entry and the intein-encoding nucleic acid sequence isunderlined.

[0016]FIG. 8 shows the nucleic acid and amino acid sequence of theMycobacterium tuberculosis recA intein-containing gene (GenBankAccession No. X58485). Numbering of the nucleotide sequence is inaccordance with the GenBank entry and the intein-encoding nucleic acidsequence is underlined.

[0017]FIG. 9 shows the nucleic acid and amino acid sequence of theGAL4::Sce VAM intein construct used to obtain conditional inteinexcision alleles.

[0018]FIG. 10 shows a Western blot analysis of the conditional Gal4:INThydrid constructs.

4. DETAILED DESCRIPTION OF THE INVENTION 4.1. General

[0019] The invention provides compositions and methods for increasing ordecreasing the bioactivity of a protein of interest, i.e., a regulatabletarget protein, by regulating the excision of a protein intron or inteininserted into the target polypeptide. In a preferred embodiment, thebioactivity of the target protein is regulated by inserting an inteinencoding intein excision activity into the target protein, such that,the excision activity of the intein may be agonized or antagonized inresponse to a signal. The preferred signals include, but are not limitedto, an increase or decrease in temperature, pH, exposure to light,unblocking of amino acid residues by dephosphorylation ordeglycosylation, ionic concentrations, concentration of various metals,osmolarity, and/or the presence or absence of certain exogenous chemicalagents or ligands.

[0020] The present invention is also directed to compositions comprisingthe modified target proteins and methods of their production. Themodified proteins comprise a regulatable intein sequence inserted intothe target protein, wherein the intein is capable of self-excision fromthe modified protein under predetermined conditions, i.e., an increaseor decrease in temperature, pH, exposure to light, unblocking of aminoacid residues by dephosphorylation or deglycosylation, ionicconcentrations, concentration of various metals, osmolarity, and/or thepresence or absence of certain exogenous chemical agents or ligands. Ifdesired, the intein can be inserted into a region of the target proetinsuch that the bioactivity of the target protein is substantiallyinactivated. Accordingly, the bioactivity of the target polypeptide maybe turned “on” or “off” on demand.

[0021] Other aspects of the invention are described below or will beapparent to those skilled in the art in light of the present disclosure.

4.2. Definitions

[0022] For convenience, the meaning of certain terms and phrasesemployed in the specification, examples, and appended claims areprovided below.

[0023] As used herein, the terms “biological activity,” “bioactivity,”“activity” or “biological function” of a polypeptide or targetpolypeptide, are used interchangeably and refer to the catalytic,signaling, structural or other biological function of the givenpolypeptide. Biological activities include, for example, binding to atarget peptide, e.g., the binding of a hormone receptor to a hormone. Asused herein the term “bioactivity” may correspond to any catalyticactivity of a polypeptide such as a kinase activity, a ligase activity,a phosphatase activity, a protease activity, or a polymerase activity.Subject “bioactivities” further include polypeptide sequences whichfunction as protein, nucleic acid, lipid or small molecule recognitiondomains such as an antigenic determinant, a phosphorylation site, a DNAbinding domain, an RNA binding domain, a secretion signal, a nuclearlocalization signal, a glycosylation site, a myristilation site, ahomodimerization or heterodimerization domain or other proteininteraction domain such as can be identified by the skilled artisanusing two-hybrid interaction screening or polypeptide display panningmethodologies.

[0024] The term “biomarker” refers a biological molecule, e.g., anucleic acid, peptide, hormone, etc., whose presence or concentrationcan be detected and correlated with a known condition, such as a diseasestate.

[0025] “Cells”, “host cells” or “recombinant host cells” are terms usedinterchangeably herein. It is understood that such terms refer not onlyto the particular subject cell but to the progeny or potential progenyof such a cell. Because certain modifications may occur in succeedinggenerations due to either mutation or environmental influences, suchprogeny may not, in fact, be identical to the parent cell, but are stillincluded within the scope of the term as used herein.

[0026] The term “chimeric polypeptide” refers generally to a polypeptidecomprising two subunits which do not occur together in the samepolypeptide in nature, or at least, if present within the samepolypeptide in nature, wherein the subunits do not occur in the sameorder in nature as in the chimeric polypeptide. When referring to thechimeric polypeptide of the invention, the term refers to a polypeptidecomprising at least two functional subunits, a first functional subunitcomprising portions of a target protein, and a second functional subunitwhich comprises a protein intron or intein. The terms “chimericpolypeptide” or “fusion polypeptide” or “hybrid polypeptide,” as usedherein interchangeably, refer to a covalent joining of a first aminoacid sequence encoding an intein polypeptide with a second amino acidsequence defining a target polypeptide. In general, an intein fusionpolypeptide can be represented by the general formula N-INT-C, whereinINT represents a wild-type intein with constitutive autoexcisionactivity or a conditional intein derivative with inducible autoexcisionactivity and N and C refer to amino- and carboxy-terminal fragments ofthe target polypeptide respectively. In trans-spliced embodiments of theinvention, two hydrid polypeptides which can be represented by thegeneral formulae N-INT^(N) and INT^(C)-C, wherein INT^(N) comprises anamino-terminal fragment of an intein and INT^(C) comprises acarboxy-terminal fragment of an intein.

[0027] A “delivery complex” shall mean a targeting means (e.g. amolecule that results in higher affinity binding of a gene, protein,polypeptide or peptide to a target cell surface and/or increasedcellular or nuclear uptake by a target cell). Examples of targetingmeans include: sterols (e.g. cholesterol), lipids (e.g. a cationiclipid, virosome or liposome), viruses (e.g. adenovirus, adeno-associatedvirus, and retrovirus) or target cell specific binding agents (e.g.ligands recognized by target cell specific receptors). Preferredcomplexes are sufficiently stable in vivo to prevent significantuncoupling prior to internalization by the target cell. However, thecomplex is cleavable under appropriate conditions within the cell sothat the gene, protein, polypeptide or peptide is released in afunctional form.

[0028] The term “equivalent” is understood to include nucleotidesequences encoding functionally equivalent polypeptides. Equivalentnucleotide sequences will include sequences that differ by one or morenucleotide substitutions, additions or deletions, such as allelicvariants; and will, therefore, include sequences that differ from thenucleotide sequence of the nucleic acids shown in, for example, SEQ IDNo. 1 due to the degeneracy of the genetic code. “Equivalentpolypeptides” of the invention are understood to include polypeptidesrelated to those disclosed by one or more amino acid substitutionscorresponding to conservative changes (i.e. those changes observedfrequently within evolutionarily divergent homologs). The “equivalentpolypeptides” of the invention further include equivalent conditionalintein polypeptides, such as those obtained by altering any known inteinpolypeptide sequence so as to correspond to the mutant conditionalintein sequences disclosed herein.

[0029] The term “extein” refers to a segment of a target polypeptidewhich is joined to an intein sequence. An N-extein is an amino-terminalportion of a target polypeptide which is joined at its carboxy-terminalend to an intein polypeptide. A C-extein is a carboxy-terminal portionof the target polypeptide which is joined at its amino-terminal end toan intein polypeptide. As used herein, the term “extein” is used inreference to both nucleic acid sequences which encode the amino-terminaland carboxy-terminal portion of the target polypeptides as well as theencoded target polypeptide segments themselves. Typically, subjectexteins of the invention are produced as chimeric polypeptides havingthe general formula N-Extein/Intein/C-Extein. The term “heterologous” orexpressions “heterologous protein” or “heterologous target,” as usedherein, refer to any polypeptide sequence encoding a bioactivity to beregulated by a subject regulatable intein, and which polypeptidesequence does not occur in nature as an intein chimeric protein of theparticular structure or sequence to be used in the method of the presentinvention. Thus subject heterologous proteins generally encode any“bioactivity” to be regulated by a regulatable intein. Preferredheterologous targets are mammalian proteins, particularly humanproteins.

[0030] “Homology” or “identity” or “similarity” refers to sequencesimilarity between two peptides or between two nucleic acid molecules.Homology can be determined by comparing a position in each sequencewhich may be aligned for purposes of comparison. When a position in thecompared sequence is occupied by the same base or amino acid, then themolecules are identical at that position. A degree of homology orsimilarity or identity between nucleic acid sequences is a function ofthe number of identical or matching nucleotides at positions shared bythe nucleic acid sequences. A degree of identity of amino acid sequencesis a function of the number of identical amino acids at positions sharedby the amino acid sequences. A degree of homology or similarity of aminoacid sequences is a function of the number of amino acids, i.e.structurally related, at positions shared by the amino acid sequences.An “unrelated” or “non-homologous” sequence shares less than 40%identity, though preferably less than 25% identity, with one of thetarget protein sequences of the present invention.

[0031] As used herein the terms “percent homology” or “percent identity”refer to degrees of similarity between two or more nucleic acids or twoor more polypeptides which are defined by various mathematicalalgorithms which have been developed in the art. For example, percentidentity can be determined by comparing a position in each sequencewhich may be aligned for purposes of comparison. When an equivalentposition in the compared sequences is occupied by the same base or aminoacid, then the molecules are identical at that position; when theequivalent site occupied by the same or a similar amino acid residue(e.g., similar in steric and/or electronic nature), then the moleculescan be referred to as homologous (similar) at that position. Expressionas a percentage of homology, similarity, or identity refers to afunction of the number of identical or similar amino acids at positionsshared by the compared sequences. Expression as a percentage ofhomology, similarity, or identity refers to a function of the number ofidentical or similar amino acids at positions shared by the comparedsequences. Various alignment algorithms and/or programs may be used,including FASTA, BLAST, or ENTREZ. FASTA and BLAST are available as apart of the GCG sequence analysis package (University of Wisconsin,Madison, Wis.), and can be used with, e.g., default settings. ENTREZ isavailable through the National Center for Biotechnology Information,National Library of Medicine, National Institutes of Health, Bethesda,Md. In one embodiment, the percent identity of two sequences can bedetermined by the GCG program with a gap weight of 1, e.g., each aminoacid gap is weighted as if it were a single amino acid or nucleotidemismatch between the two sequences.

[0032] Other techniques for alignment are described in Methods inEnzymology, vol. 266: Computer Methods for Macromolecular SequenceAnalysis (1996), ed. Doolittle, Academic Press, Inc., a division ofHarcourt Brace & Co., San Diego, Calif., USA. Preferably, an alignmentprogram that permits gaps in the sequence is utilized to align thesequences. The Smith-Waterman is one type of algorithm that permits gapsin sequence alignments. See Meth. Mol. Biol. 70: 173-187 (1997). Also,the GAP program using the Needleman and Wunsch alignment method can beutilized to align sequences. An alternative search strategy uses MPSRCHsoftware, which runs on a MASPAR computer. MPSRCH uses a Smith-Watermanalgorithm to score sequences on a massively parallel computer. Thisapproach improves ability to pick up distantly related matches, and isespecially tolerant of small gaps and nucleotide sequence errors.Nucleic acid-encoded amino acid sequences can be used to search bothprotein and DNA databases.

[0033] Databases with individual sequences are described in Methods inEnzymology, ed. Doolittle, supra. Databases include Genbank, EMBL, andDNA Database of Japan (DDBJ).

[0034] “Inteins” or “protein introns” of this invention includeintron-like elements that are removed post-translationally from thetarget protein in which they are embedded in-frame, by self-splicing. Inother words, inteins are splicing elements that occur naturally asin-frame protein fusions, these inteins are not removed from RNAtranscripts, but are translated in-frame as part the target protein inwhich they are inserted. Self-excision of the intein is followed byligation of the two external remaining sequences of the target proteinto produce an active functional protein. The external target sequencesare called exteins. The term intein, as used herein includes within itsscope naturally occurring isolated and/or purified intein polypeptides,fragments comprising intein elements minimally required forself-splicing, for example inteins comprising the N- and C-terminaldomains of the inteins linked with a linker moiety, trans-splicedinteins, synthetically designed inteins, condition-sensitive mutants.The term includes both naturally occurring inteins as well asrecombinant or synthetic inteins. As used herein, the term inteinincludes the nucleic acids encoding the autonomous polypeptides and thepolypeptide itself.

[0035] The term “interact” as used herein is meant to include detectablerelationships or association (e.g. biochemical interactions) betweenmolecules, such as interaction between protein-protein, protein-nucleicacid, nucleic acid-nucleic acid, and protein-small molecule or nucleicacid-small molecule in nature. An interaction can be direct or indirect,i.e., mediated by another molecule. Two molecules interacting directlyare also referred to as binding to each other.

[0036] The term “isolated” as used herein with respect to nucleic acids,such as DNA or RNA, refers to molecules separated from other DNAs, orRNAs, respectively, that are present in the natural source of themacromolecule. For example, an isolated nucleic acid encoding one of thesubject intein polypeptides preferably includes no more than 10kilobases (kb) of nucleic acid sequence which naturally immediatelyflanks the intein coding sequence DNA, more preferably no more than 5 kbof such naturally occurring cDNA or genomic flanking sequences, and mostpreferably less than 1.5 kb of such flanking sequence. The term isolatedas used herein also refers to a nucleic acid or peptide that issubstantially free of cellular material, viral material, or culturemedium when produced by recombinant DNA techniques, or chemicalprecursors or other chemicals when chemically synthesized. Moreover, an“isolated nucleic acid” is meant to include nucleic acid fragments whichare not naturally occurring as fragments and would not be found in thenatural state. The term “isolated” is also used herein to refer topolypeptides which are isolated from other cellular proteins and ismeant to encompass both purified and recombinant polypeptides.

[0037] A “knock-in” transgenic animal refers to an animal that has had amodified gene introduced into its genome and the modified gene can be ofexogenous or endogenous origin. In preferred embodiments, a regulatableintein is inserted or “knocked-into” a target gene of the transgenicanimal so as to render one or more bioactivities encoded by the targetgene polypeptide subject to regulation by controlled intein excision.

[0038] A “knock-out” transgenic animal refers to an animal in whichthere is partial or complete suppression of the expression of anendogenous gene (e.g, based on deletion of at least a portion of thegene, replacement of at least a portion of the gene with a secondsequence, introduction of stop codons, the mutation of bases encodingcritical amino acids, or the removal of an intron junction, etc.). Inpreferred embodimbents, the “knock-out” gene locus corresponding to themodified endogenous gene no longer encodes a functional polypeptideactivity and is said to be a “null” allele. Accordingly, knock-outtransgenic animals of the present invention include those carrying onetarget gene null mutation, i.e. a target gene null allele heterozygousanimals, and those carrying two target gene null mutations, such as atarget gene null allele homozygous animals.

[0039] A “knock-out construct” refers to a nucleic acid sequence thatcan be used to decrease or suppress expression of a protein encoded byendogenous DNA sequences in a cell. In a simple example, the knock-outconstruct is comprised of a hypothetical target gene with a deletion ina critical portion of the gene so that active protein cannot beexpressed therefrom. Alternatively, a number of termination codons canbe added to the native gene to cause early termination of the protein oran intron junction can be inactivated. In a typical knock-out construct,some portion of the gene is replaced with a selectable marker (such asthe neo gene) so that the gene can be represented as follows: TARGET5′/neo/TARGET 3′, where TARGET 5′ and TARGET 3′, refer to genomic orcDNA sequences which are, respectively, upstream and downstream relativeto a portion of the TARGET gene and where neo refers to a neomycinresistance gene. In another knock-out construct, a second selectablemarker is added in a flanking position so that the gene can berepresented as: TARGET/neo/TARGET/TK, where TK is a thymidine kinasegene which can be added to either the TARGET 5′ or the TARGET 3′sequence of the preceding construct and which further can be selectedagainst (i.e. is a negative selectable marker) in appropriate media.This two-marker construct allows the selection of homologousrecombination events, which removes the flanking TK marker, fromnon-homologous recombination events which typically retain the TKsequences. The gene deletion and/or replacement can be from the exons,introns, especially intron junctions, and/or the regulatory regions suchas promoters.

[0040] The term “modulation” as used herein refers to both upregulation(i.e., activation or stimulation (e.g., by agonizing or potentiating))and downregulation (i.e. inhibition or suppression (e.g., byantagonizing, decreasing or inhibiting)) of an activity and, preferably,a polypeptide bioactivity.

[0041] The term “mutated gene” refers to an allelic form of a gene,which is capable of altering the phenotype of a subject having themutated gene relative to a subject which does not have the mutated gene.If a subject must be homozygous for this mutation to have an alteredphenotype, the mutation is said to be recessive. If one copy of themutated gene is sufficient to alter the genotype of the subject, themutation is said to be dominant. If a subject has one copy of themutated gene and has a phenotype that is intermediate between that of ahomozygous and that of a heterozygous subject (for that gene), themutation is said to be co-dominant.

[0042] The “non-human animals” of the invention include mammalians suchas rodents, non-human primates, sheep, dog, cow, chickens, amphibians,reptiles, etc. Preferred non-human animals are selected from the rodentfamily including rat and mouse, most preferably mouse, though transgenicamphibians, such as members of the Xenopus genus, and transgenicchickens can also provide important tools for understanding andidentifying agents which can affect, for example, embryogenesis andtissue formation. The term “chimeric animal” is used herein to refer toanimals in which the recombinant gene is found, or in which therecombinant gene is expressed in some but not all cells of the animal.The term “tissue-specific chimeric animal” indicates that one of therecombinant genes, e.g., gene encoding a chimeric polypeptide, ispresent and/or expressed or disrupted in some tissues but not others.

[0043] As used herein, the term “nucleic acid” refers to polynucleotidessuch as deoxyribonucleic acid (DNA), and, where appropriate, ribonucleicacid (RNA). The term should also be understood to include, asequivalents, analogs of either RNA or DNA made from nucleotide analogs,and, as applicable to the embodiment being described, single (sense orantisense) and double-stranded polynucleotides.

[0044] The term “nucleotide sequence complementary to the nucleotidesequence set forth in SEQ ID No. x” refers to the nucleotide sequence ofthe complementary strand of a nucleic acid strand having SEQ ID No. x.The term “complementary strand” is used herein interchangeably with theterm “complement”. The complement of a nucleic acid strand can be thecomplement of a coding strand or the complement of a non-coding strand.When referring to double stranded nucleic acids, the complement of anucleic acid having SEQ ID No. x refers to the complementary strand ofthe strand having SEQ ID No. x or to any nucleic acid having thenucleotide sequence of the complementary strand of SEQ ID No. x. Whenreferring to a single stranded nucleic acid having the nucleotidesequence SEQ ID No. x, the complement of this nucleic acid is a nucleicacid having a nucleotide sequence which is complementary to that of SEQID No. x. The nucleotide sequences and complementary sequences thereofare always given in the 5′ to 3′ direction.

[0045] The term “percent identical” refers to sequence identity betweentwo amino acid sequences or between two nucleotide sequences. Identitycan each be determined by comparing a position in each sequence whichmay be aligned for purposes of comparison. When an equivalent positionin the compared sequences is occupied by the same base or amino acid,then the molecules are identical at that position; when the equivalentsite occupied by the same or a similar amino acid residue (e.g., similarin steric and/or electronic nature), then the molecules can be referredto as homologous (similar) at that position. Expression as a percentageof homology, similarity, or identity refers to a function of the numberof identical or similar amino acids at positions shared by the comparedsequences. Expression as a percentage of homology, similarity, oridentity refers to a function of the number of identical or similaramino acids at positions shared by the compared sequences. Variousalignment algorithms and/or programs may be used, including FASTA,BLAST, or ENTREZ. FASTA and BLAST are available as a part of the GCGsequence analysis package (University of Wisconsin, Madison, Wis.), andcan be used with, e.g., default settings. ENTREZ is available throughthe National Center for Biotechnology Information, National Library ofMedicine, National Institutes of Health, Bethesda, Md. In oneembodiment, the percent identity of two sequences can be determined bythe GCG program with a gap weight of 1, e.g., each amino acid gap isweighted as if it were a single amino acid or nucleotide mismatchbetween the two sequences.

[0046] Other techniques for alignment are described in Methods inEnzymology, vol. 266: Computer Methods for Macromolecular SequenceAnalysis (1996), ed. Doolittle, Academic Press, Inc., a division ofHarcourt Brace & Co., San Diego, Calif., USA. Preferably, an alignmentprogram that permits gaps in the sequence is utilized to align thesequences. The Smith-Waterman is one type of algorithm that permits gapsin sequence alignments. See Meth. Mol. Biol. 70: 173-187 (1997). Also,the GAP program using the Needleman and Wunsch alignment method can beutilized to align sequences. An alternative search strategy uses MPSRCHsoftware, which runs on a MASPAR computer. MPSRCH uses a Smith-Watermanalgorithm to score sequences on a massively parallel computer. Thisapproach improves ability to pick up distantly related matches, and isespecially tolerant of small gaps and nucleotide sequence errors.Nucleic acid-encoded amino acid sequences can be used to search bothprotein and DNA databases.

[0047] Databases with individual sequences are described in Methods inEnzymology, ed. Doolittle, supra. Databases include Genbank, EMBL, andDNA Database of Japan (DDBJ).

[0048] Preferred nucleic acids have a sequence at least 70%, and morepreferably 80% identical and more preferably 90% and even morepreferably at least 95% identical to an nucleic acid sequence of asequence shown in one of the sequence listings. Nucleic acids at least90%, more preferably 95%, and most preferably at least about 98-99%identical with a nucleic sequence represented in one of the sequencelistings are of course also within the scope of the invention. Inpreferred embodiments, the nucleic acid is mammalian. In comparing a newnucleic acid with known sequences, several alignment tools areavailable. Examples include PileUp, which creates a multiple sequencealignment, and is described in Feng et al., J. Mol. Evol. (1987)25:351-360. Another method, GAP, uses the alignment method of Needlemanet al., J. Mol. Biol. (1970) 48: 443-453. GAP is best suited for globalalignment of sequences. A third method, BestFit, functions by insertinggaps to maximize the number of matches using the local homologyalgorithm of Smith and Waterman, Adv. Appl. Math. (1981) 2:482-489.

[0049] A “polymorphic gene” refers to a gene having at least onepolymorphic region.

[0050] The term “polymorphism” refers to the coexistence of more thanone form of a gene or portion (e.g., allelic variant) thereof. A portionof a gene of which there are at least two different forms, i.e., twodifferent nucleotide sequences, is referred to as a “polymorphic regionof a gene”. A polymorphic region can be a single nucleotide, theidentity of which differs in different alleles. A polymorphic region canalso be several nucleotides long.

[0051] As used herein, the term “promoter” means a DNA sequence thatregulates expression of a selected DNA sequence operably linked to thepromoter, and which effects expression of the selected DNA sequence incells. The term encompasses “tissue specific” promoters, i.e. promoters,which effect expression of the selected DNA sequence only in specificcells (e.g. cells of a specific tissue). The term also covers so-called“leaky” promoters, which regulate expression of a selected DNA primarilyin one tissue, but cause expression in other tissues as well. The termalso encompasses non-tissue specific promoters and promoters thatconstitutively express or that are inducible (i.e. expression levels canbe controlled).

[0052] The terms “protein”, “polypeptide” and “peptide” are usedinterchangeably herein when referring to a gene product. The termpolypeptide includes peptidomimetics.

[0053] The term “recombinant protein” refers to a polypeptide of thepresent invention which is produced by recombinant DNA techniques,wherein generally, DNA encoding a polypeptide is inserted into asuitable expression vector which is in turn used to transform a hostcell to produce the heterologous protein. Moreover, the phrase “derivedfrom”, with respect to a recombinant gene, is meant to include withinthe meaning of “recombinant protein” those proteins having an amino acidsequence of a native polypeptide, or an amino acid sequence similarthereto which is generated by mutations including substitutions anddeletions (including truncation) of a naturally occurring form of thepolypeptide.

[0054] The term “regulation” as used herein refers to both upregulation(i.e., activation or stimulation (e.g., by agonizing or potentiating))and downregulation (i.e. inhibition or suppression (e.g., byantagonizing, decreasing or inhibiting)).

[0055] The term “signal” as used refers to any chemical, physical orenergetic agent which can be used to alter the autoexcision activity ofthe subject regulatable inteins. Examples include of signalscontemplated in the instant invention include: temperature changes(either increases or decreases in temperature); pH changes; changes insalt concentration; changes in ionic strength; exposure toelectromagnetic radiation; and changes in pressure. Subject signals ofthe invention further include chemical signals such as signals producedby the addition or removal of: a chemical ligand (preferably a bivalentdimerizing agent); a metal ion; a carbohydrate moiety; a lipid moiety; anucleic acid; or a polypeptide.

[0056] “Small molecule” as used herein, is meant to refer to acomposition, which has a molecular weight of less than about 5 kD andmost preferably less than about 4 kD. Small molecules can be nucleicacids, peptides, polypeptides, peptidomimetics, carbohydrates, lipids orother organic (carbon containing) or inorganic molecules. Manypharmaceutical companies have extensive libraries of chemical and/orbiological mixtures, often fungal, bacterial, or algal extracts, whichcan be screened with any of the assays of the invention, e.g., toidentify compounds that modulate the interaction between twopolypeptides.

[0057] As used herein, the term “specifically hybridizes” or“specifically detects” refers to the ability of a nucleic acid moleculeto hybridize to at least approximately 6, 12, 20, 30, 50, 100, 150, 200,300, 350, 400 or 425 consecutive nucleotides of a nucleic acid.

[0058] The term “statistically significant” as used herein refers to ameasurement which is not the result of random variation or samplingerror. For example, the expression “statistically significant change inbioactivity” refers to an increase or decrease of at least about 50% inthe value of a particular bioactivity measurement. The bioactivitymeasurement may refer to, for example, a rate of catalysis or aphenotypic measure of biological complementation. For example,statistically significant increases in growth on galactose (as reflectede.g. by colony size) of a yeast gal4 GAL4:intein strain in contact witha test compound (as compared to growth in the absence of said compound)identify suitable intein self-excision agonists, while statisticallysignificant decreases in growth on galactose of this strain when incontact with a test compound identify suitable intein self-excisionantagonists.

[0059] The term “target cell” refers to a cell comprising a targetpolypeptide, the regulation of the bioactivity of which is desired.

[0060] The term “target polypeptide” refers to a polypeptide, thebioactivity of which polypeptide is to be regulated. The target proteinmay comprise one or more intein sequences.

[0061] “Transcriptional regulatory sequence” is a generic term usedthroughout the specification to refer to DNA sequences, such asinitiation signals, enhancers, and promoters, which induce or controltranscription of protein coding sequences with which they are operablylinked. In preferred embodiments, transcription of a nucleic acidencoding a chimeric polypeptide of the invention is under the control ofa promoter sequence (or other transcriptional regulatory sequence) whichcontrols the expression of the recombinant gene in a cell-type in whichexpression is intended.

[0062] As used herein, the term “transfection” means the introduction ofa nucleic acid, e.g., via an expression vector, into a recipient cell bynucleic acid-mediated gene transfer. “Transformation”, as used herein,refers to a process in which a cell's genotype is changed as a result ofthe cellular uptake of exogenous DNA or RNA, and, for example, thetransformed cell expresses a recombinant form of a target polypeptideor, in the case of anti-sense expression from the transferred gene, theexpression of a naturally-occurring form of the target polypeptide isdisrupted.

[0063] As used herein, the term “transgene” means a nucleic acidsequence (encoding, e.g., a chimeric polypeptide of the invention) whichhas been introduced into a cell. A transgene could be partly or entirelyheterologous, i.e., foreign, to the transgenic animal or cell into whichit is introduced, or, is homologous to an endogenous gene of thetransgenic animal or cell into which it is introduced, but which isdesigned to be inserted, or is inserted, into the animal's genome insuch a way as to alter the genome of the cell into which it is inserted(e.g., it is inserted at a location which differs from that of thenatural gene or its insertion results in a knockout). A transgene canalso be present in a cell in the form of an episome. A transgene caninclude one or more transcriptional regulatory sequences and any othernucleic acid, such as introns, that may be necessary for optimalexpression of a selected nucleic acid.

[0064] A “transgenic animal” refers to any animal, preferably anon-human mammal, bird or an amphibian, in which one or more of thecells of the animal contain heterologous nucleic acid introduced by wayof human intervention, such as by transgenic techniques well known inthe art. The nucleic acid is introduced into the cell, directly orindirectly by introduction into a precursor of the cell, by way ofdeliberate genetic manipulation, such as by microinjection or byinfection with a recombinant virus. The term genetic manipulation doesnot include classical cross-breeding, or in vitro fertilization, butrather is directed to the introduction of a recombinant DNA molecule.This molecule may be integrated within a chromosome, or it may beextrachromosomally replicating DNA. In the typical transgenic animalsdescribed herein, the transgene causes cells to express a chimericpolypeptide or other polypeptide of interest. However, transgenicanimals in which the recombinant chimeric gene is silent are alsocontemplated, as for example, the FLP or CRE recombinase dependentconstructs. Moreover, “transgenic animal” also includes thoserecombinant animals in which gene disruption of one or more genes iscaused by human intervention, including both recombination and antisensetechniques.

[0065] The term “treating” as used herein is intended to encompasscuring as well as ameliorating at least one symptom of the condition ordisease.

[0066] The term “vector” refers to a nucleic acid molecule capable oftransporting another nucleic acid to which it has been linked. One typeof preferred vector is an episome, i.e., a nucleic acid capable ofextra-chromosomal replication. Preferred vectors are those capable ofautonomous replication and/or expression of nucleic acids to which theyare linked. Vectors capable of directing the expression of genes towhich they are operatively linked are referred to herein as “expressionvectors”. In general, expression vectors of utility in recombinant DNAtechniques are often in the form of “plasmids” which refer generally tocircular double stranded DNA loops which, in their vector form are notbound to the chromosome. In the present specification, “plasmid” and“vector” are used interchangeably as the plasmid is the most commonlyused form of vector. However, the invention is intended to include suchother forms of expression vectors which serve equivalent functions andwhich become known in the art subsequently hereto.

[0067] A “viral vector” refers to a nucleic acid containing at least aportion of a viral genome sufficient for replication and packaging inthe presence of an appropriate helper virus and appropriate cell line orpackaging extract. For example, by an “AAV vector” is meant a vectorderived from an adeno-associated virus serotype, including withoutlimitation, AAV-1, AAV-2, AAV-3, AAV-4, AAV-5, AAVX7, etc. AAV vectorscan have one or more of the AAV wild-type genes deleted in whole orpart, preferably the rep and/or cap genes, but retain functionalflanking ITR sequences. Functional ITR sequences are necessary for therescue, replication and packaging of the AAV virion. Thus, an AAV vectoris defined herein to include at least those sequences required in cisfor replication and packaging (e.g., functional ITRs) of the virus. TheITRs need not be the wild-type nucleotide sequences, and may be altered,e.g., by the insertion, deletion or substitution of nucleotides, so longas the sequences provide for functional rescue, replication andpackaging.

[0068] By “virion” or “viral particle” is meant a complete virusparticle, such as a wild-type (wt) virus particle (comprising a nucleicacid genome associated with a capsid protein coat), or a recombinantvirus particle as described below. For example, by “adenoviral virion”is meant a complete virus particle, such as a wild-type (wt) Ad virusparticle comprising an Ad nucleic acid genome associated with an Adcapsid protein coat, or a recombinant AAV virus particle as describedbelow. In this regard, single-stranded AAV nucleic acid molecules ofeither complementary sense, e.g., “sense” or “antisense” strands, can bepackaged into any one AAV virion and both strands are equallyinfectious.

4.3. Polypetides and Nucleic Acids of the Present Invention

[0069] Inteins are a group of related protein elements found within arange of host proteins immediately after their translation. Aftertranslation, the intein self-splices itself out of or “autoexcises”itself from the host (target) protein. After autoexcision, theamino-terminal target protein fragment and carboxy-terminal targetprotein fragment are joined so as to result in a functional targetprotein and an autonomous intein (see FIG. 1). These amino- andcarboxy-terminal fragments of the host protein that become part of themature functional protein are frequently referred to as “exteins”, andthe extein fragment that is C-terminal to the end of the intein isreferred to as the C-extein and the amino-terminal fragment that is tothe N-terminal side of the intein is referred to as N-extein. There areat least forty known naturally occurring inteins. In fact, these inteinshave been compiled in a comprehensive on-line database by the NewEngland Biolabs (http://www.neb.com/neb/inteins.html).

[0070] The inteins of this invention may be at least about 100-500 aminoacids in length. In one embodiment, the intein is about 450 amino acidsin length. In another embodiment, the intein is about 400 amino acids inlength. In yet another embodiment, the intein is about 300 amino acidsin length. In yet another embodiment, the intein is about 250 aminoacids in length. In another embodiment, the intein is about 200 aminoacids in length, or about 150 amino acid residues in length, or 100amino acid residues in length. In a preferred embodiment, the intein isabout 105 amino acids in length. Exemplary inteins of this inventioninclude but are not limited to: the Sce VMA intein as shown in FIG. 5(S. Cerevisiae, Vacuaolar ATPase subunit; GenBank Accession No. M21609)and corresponding to the polypeptide of SEQ ID No. 14 which is encodedby the nucleic acid of SEQ ID No. 13.; Ctr VMA intein as shown in FIG. 6(Candida Tropicalis Vacuaolar ATPase subunit; GenBank Accession No.M64984) and corresponding to the polypeptide of SEQ ID No. 16 which isencoded by the nucleic acid of SEQ ID No. 15; Ceu clpP intein as shownin FIG. 7 (Chlamydomonas eugametos; GenBank Accession No. L29402) andcorresponding to the polypeptide of SEQ ID No. 18 which is encoded bythe nucleic acid of SEQ ID No. 17; and the Mtu recA intein as shown inFIG. 8 (Mycobacterium tuberculosis recA intein-containing gene, GenBankAccession No. X58485) and corresponding to the polypeptide sequence ofSEQ ID No. 20 which is encoded by the nucleic acid sequence of SEQ IDNo. 19.

[0071] In one embodiment, the inteins of this invention include apolypeptide which by a nucleotide sequence that hybridizes understringent conditions to a nucleic acid sequence represented in one ormore of SEQ ID Nos. 13, 15, 17 or 19. Appropriate stringency conditionswhich promote DNA hybridization, for example, 6.0× sodiumchloride/sodium citrate (SSC) at about 45

C, followed by a wash of 2.0×SSC at 50

C, are known to those skilled in the art or can be found in CurrentProtocols in Molecular Biology, John Wiley & Sons, N.Y. (1989),6.3.1-6.3.6. For example, the salt concentration in the wash step can beselected from a low stringency of about 2.0×SSC at 50

C to a high stringency of about 0.2×SSC at 50

C. In addition, the temperature in the wash step can be increased fromlow stringency conditions at room temperature, about 22

C, to high stringency conditions at about 65

C.

[0072] In preferred embodiments the intein of the present invention is aconditional intein allele corresponding to an alteration of the“wild-type” Sce VMA intein shown in FIG. 4 (SEQ ID No. 1). For example,preferred inteins of the invention comprise at least one of the aminoacid alterations associated with the temperature sensitive (TS) inteinsTS1, TS4, TS7, TS8, TS10, TS15, TS17, TS18 or TS19 or the cold sensitive(CS) intein CS1, CS2 or CS3 as shown in FIG. 4. In certain embodiments,the subject inteins correspond to the conditional alleles of theSaccharomyces cerevisiae VMA intein polypeptide sequence specified bySEQ ID Nos. 2-12. These amino acid alterations can be effected bysite-directed mutagenesis of the Sce VMA intein-encoding nucleic acidsequence shown in FIG. 5 (SEQ ID No. 13) in view of the standard geneticcode shown below. AAs= FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNKKSSRRVVVVAAAADDEEGGGG Starts= ---M--------------M---------------------------M Base1= TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG Base2= TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG Base3= TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTGAGTCAGTGAGTCAGTCAGTCAG

[0073] For example, the conditional intein TS1, corresponding to aleucine to proline alteration at Sce VMA amino acid residue 212, can beproduced by mutating the codon CTT, which occurs beginning at nucleotide1363 of SEQ ID No. 13, to CCT by a single C to T transition mutationeffected through site-directed mutagenesis techniques which are known inthe art (see e.g. Costa et al. (1996) Methods Mol. Biol. 57: 239-48).

[0074] In certain embodiments, the invention provides controllableintein-encoding nucleic acids, homologs thereof, and portions thereof.Preferred nucleic acids have a sequence at least about 60%, 61%, 62%,63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%,77%, 78%, 79%, 80%, and more preferably 85% homologous and morepreferably 90% and more preferbly 95% and even more preferably at least99% homologous with a nucleotide sequence of an intein-encoding element,e.g., such as a sequence shown in one of SEQ ID Nos: 13, 15, 17 or 19 orcomplement thereof. In preferred embodiments, of the intein-encodingnucleic acids having ATCC Designation No. ______, corresponding to TS1,ATCC Designation No. ______, corresponding to TS4, ATCC Designation No.______, corresponding to TS8, ATCC Designation No. ______, correspondingto TS10, ATCC Designation No. ______, corresponding to TS15, ATCCDesignation No. ______, corresponding to TS17, ATCC Designation No.______, corresponding to TS18, ATCC Designation No. ______,corresponding to TS19, ATCC Designation No. ______, corresponding toCS1, ATCC Designation No. ______, corresponding to CS2 or ATCCDesignation No. ______, corresponding to CS3. In preferred embodiments,the nucleic acid is from Saccharomyces cerevisiae and in particularlypreferred embodiments, the nucleic acid comprises an insertion of theSca VMA intein into the GAL4 coding sequence immediately before thethird cysteine residue within the GAL4 DNA binding domain (GAL4 aminoacid residue 20) and having the ATCC deposit Designation No. ______.

[0075] In certain embodiments, the allelic changes associated withmultiple temperature sensitive alterations can be recombined into asingle conditional intein polypeptide. For example the TS 1 allelecorresponding to L212P described above can be combined with the aminoacid alteration associated with the TS8 allele to yield an L21P, D324Gdouble mutant conditional intein.

[0076] The present invention also provides probes/primers comprising asubstantially purified oligonucleotide, wherein the oligonucleotidecomprises a region of nucleotide sequence which hybridizes understringent conditions to at least 10 consecutive nucleotides of sense orantisense sequence of one of SEQ ID Nos. 1 or naturally occurringmutants thereof. In preferred embodiments, the probe/primer furthercomprises a label group attached thereto and able to be detected, e.g.the label group is selected from a group consisting of radioisotopes,fluorescent compounds, enzymes, and enzyme co-factors.

[0077] In a further embodiment, the nucleic acid probe hybridizes understringent conditions to a nucleic acid corresponding to at least 12consecutive nucleotides of at least one of SEQ ID Nos. 13, 15, 17, 19 or21; more preferably to at least 20 consecutive nucleotides of SEQ IDNos. 13, 15, 17, 19 or 21; more preferably to at least 40 consecutivenucleotides of SEQ ID Nos. 13, 15, 17, 19 or 21.

[0078] In general, inteins contain about 10 conserved motifs, and theseintein motifs can be grouped in three domains according to theirlocation and inferred function. See Peitrokovski, (1998) ProteinScience, 7:64-71). These include a N-terminal domain, a C-terminaldomain, and an endonuclease EN domain. The N- and C-domains are requiredfor the self-splicing activity and the endonuclease domain is notrequired for this activity.

[0079] The N-domain includes six motifs and spans about 90-150 aminoacids. Within the N-domain, domains N2 and N4 are similar to each otherand their main attribute is a conserved acidic residue usually precededby a glycine. Motif N4 is more conserved that motif N2, being longer andless diverse. Nevertheless, the N2 motif is reliably assigned (P value1·10⁻¹⁷; Schuler et al., 1991) and can be identified in almost allinteins. Motif N4 could not be identified in three of the foureukaryotic inteins, in inteins Tli pol-2, Mja pol-1, and their alleles,and in intein Mja PEPSyn.

[0080] The C-domain includes two motifs in the C-terminal spanning about25-60 amino acids. A central EN-domain typically consisting of fourmotifs. This domain is about 190-420 amino acids in size and is optionalas far as splicing is concerned. Until now, this domain was only knownto include motifs similar to those of dodecapeptide (DOD, LAGLI-DADG)homing endonucleases (Pietrokovski (1994) Protein Sci 3: 2340-50;Pietrokovski (1998) Protein Sci 7: 64-71; Perler et al. (1997) NucleicAcids Res. 25: 1087-93). The central endonuclease domain is separatedfrom the minimal splicing domains by variable spacers, for example,various peptide linkers.

[0081] Examples of conserved intein motifs are shown in the table below,this example includes the conserved motifs present in Sce. VMA: TABLE 1Conserved Motifs Found In Inteins Domain Conserved Motif N1 DomainCFAKGTNVLMADG; (SEQ ID NO:23) N2 Domain IEVGNKV; (SEQ ID NO:24) N3Domain LLKFTCNATHELVV; (SEQ ID NO:25) N4 Domain WKLIDEIKPGDYAVLQ; (SEQID NO:26) EN1 Domain LLGLWIGDG; (SEQ ID NO:27) EN2 Domain VKNIPSFL; (SEQID NO:28) EN3 Domain FLAGLIDSDG; (SEQ ID NO:29) EN4 DomainTIHTSVRDGLVSLARSLGL (SEQ ID NO:30) C1 Domain NQVVVHNC. (SEQ ID NO:31) C2Domain YGITLSDDSDHQFL (SEQ ID NO:32)

[0082] In addition, variant forms, e.g. mutants of the subject inteinsare also contemplated as being equivalent to those peptides and DNAmolecules that are set forth in more detail, as will be appreciated bythose skilled in the art. For example, it is reasonable to expect thatan isolated replacement of a leucine with an isoleucine or valine, anaspartate with a glutamate, a threonine with a serine, or a similarreplacement of an amino acid with a structurally related amino acid(i.e. conservative mutations) will not have a major effect on theself-splicing activity of the resulting intein polypeptide. In anyevent, the residues which are essential for splicing are set forth inthe section below.

[0083] Conservative replacements are those that take place within afamily of amino acids that are related in their side chains. Geneticallyencoded amino acids are can be divided into the following families: (1)acidic (a)=aspartate, glutamate; (2) basic (b)=lysine, arginine,histidine; (3) nonpolar=alanine, valine, leucine, isoleucine, proline,phenylalanine, methionine, tryptophan; and (4) uncharged polar=glycine,asparagine, glutamine, cysteine, serine, threonine, tyrosine;alternatively serine, threonine and cysteine may be classifiedseparately as being polar amino acids (p); (5) Phenylalanine,tryptophan, and tyrosine are sometimes classified jointly as aromaticamino acids (r); and (6) hydrophobic (h)=glycine, alanine, valine,leucine, isoleucine, and methionine.

[0084] In similar fashion, the amino acid repertoire can be grouped as:(1) acidic=aspartate, glutamate; (2) basic=lysine, arginine histidine,(3) aliphatic=glycine, alanine, valine, leucine, isoleucine, serine,threonine, with serine and threonine optionally be grouped separately asaliphatic-hydroxyl; (4) aromatic=phenylalanine, tyrosine, tryptophan;(5) amide=asparagine, glutamine; and (6) sulfur -containing=cysteine andmethionine. (see, for example, Biochemistry, 2nd ed, Ed. by L. Stryer,WH Freeman and Co.: 1981). Whether a change in the amino acid sequenceof a peptide results in a functional homolog can be readily determinedby assessing the ability of the variant peptide to produce a response incells in a fashion similar to the wild-type protein.

[0085] Furthermore, based upon sequence alignment of various inteinpolypeptides known in the art, the conserved blocks, may be representedby the following general formulas: TABLE 2 General Formula for theConserved Motifs Found In Inteins Domain Conserved Mohf N1 DomainCX₁X₂X₃DX₄X₅X₆X₇X₈X₉X₁₀G; (SEQ ID NO:33) N2 Domain X₁₁X₁₂X₁₃GX₁₄X₁₅V;(SEQ ID NO:34) N3 Domain GX₁₆X₁₇X₁₈X₁₉X₂₀TX₂₁X₂₂HX₂₃X₂₄X₂₅X₂₆; (SEQ IDNO:35) N4 Domain WX₂₇X₂₈X₂₉X₃₀X₃₁X₃₂X₃₃X₃₄X₃₅DX₃₆X₃₇X₃₈X₃₉X₄₀; (SEQ IDNO:36) EN1 Domain LX₄₁GX₄₂X₄₃X₄₄X₄₅X₄₆G; (SEQ ID NO:37) EN2 DomainX47KX48IPX49X50X51; (SEQ ID NO:38) EN3 Domain X52LX53GX54FX55X56DG; (SEQID NO:39) EN4 Domain X57X585X59X60X61X62X63X64X64X66X67LLX68X69X70GI(SEQ ID NO:40) C1 Domain X71VYDLX72VX73X74X75X76X77FX78. (SEQ ID NO:41)C2 Domain NGX79X80X81HNX82 (SEQ ID NO:42)

[0086] “X” is an amino acid which can be selected from amongst aminoacid residue which would be conservative substitutions for the aminoacids which appear naturally in each of those positions. For instance,conserved block N1 comprises the following amino avid residues: X1belongs to class h as designated above, X2 and X3 can be any amino acid,X4 belongs to class p, X5 may be any amino acid, X6, X7, and X8 belongto class h, X9, X10 may be any amino acid.

[0087] Conserved block N2 comprises X11 which belong to class h, X12belongs to class b, X13 belongs to class h, X14 belongs to class a, andX15 may be any amino acid.

[0088] Conserved block N3 comprises X16 and X17 which may be any aminoacid, X18 belongs to class h, X19 may be any amino acid, X20 belongs toclass h, X21, X22, and X23 may be any amino acid, X24, X25, and X26 areclass h.

[0089] Conserved block N4 comprises X27 through X29, X3 1, X33 throughX40 may be any amino acid, X30 belongs to class a, and X32 is class h.

[0090] Conserved block EN1 comprises X41 which belongs to class h, X42and X43 may be any amino acid, X44 and X45 are h, X46 is class a.

[0091] Conserved block EN2 comprises X47 through X50 which may be anyamino acid, X51 is class h.

[0092] Conserved block EN3 comprises X52 and X53 which may be any aminoacid, X54 is class h, X55 is class a, and X56 is class h.

[0093] Conserved block EN4 comprises X57 which belongs to class b, X58through X60 may be any amino acid, X 61 and X62 are class h, X63 and X64may be any amino acid, X65 is class h, X66 through X69 may be any aminoacid and X70 is class h.

[0094] Conserved block C1 comprises X71 which belongs to class r, X72 isa member of class p, X73 is class a, X74 through X77 may be any aminoacid, X78 is class h.

[0095] Conserved block C2 comprises X79, X80, and X81 are class h, andX82 is class p.

[0096] In one embodiment, the invention includes a nucleic acid probewhich hybridizes under stringent conditions to a nucleic acidcorresponding to SEQ ID Nos. 13, 15, 17, 19 or 21; more preferably to atleast 20 consecutive nucleotides of SEQ ID Nos. 13, 15, 17, 19 or 21;more preferably to at least 40 consecutive nucleotides of SEQ ID Nos.13, 15, 17, 19 or 21.

[0097] In one embodiment, this invention includes within its scopecondition-sensitive mutant inteins. A conditional mutant intein retainsits function, i.e., the self-splicing function, under one set ofconditions, called permissive, but lacks that function under a differentset of conditions, called nonpermissive; the latter must still bepermissive for the wild-type allele of the gene. Conditional mutants arepresumed, in most cases, to result from missense mutations in astructural gene encoding a protein. In the case of temperature-sensitive(ts) mutants, the amino acid replacement resulting from the missensemutation partially destabilizes the encoded protein, resulting in themaintenance of its three-dimensional integrity only at relatively lowtemperatures.

[0098] Several types of conditional mutants and methods for producingthem have been developed since the original demonstration of the utilityof ts mutants (Horowitz, Genetics 33, 612 (1948). Accordingly, thisinvention provides a means for generating conditional mutants of anygene product of interest without having to laboriously screen formutations within the host itself.

[0099] In certain embodiments, the condition-sensitive mutant intein istemperature sensitive (TS) or cold sensitive (CS) intein. In alternativeembodiments, the condition-sensitive mutant intein is sensitive to oneor more of pH, exposure to light, unblocking of amino acid residues bydephosphorylation or deglycosylation, ionic concentrations,concentration of various metals, osmolarity, and/or the presence orabsence of certain exogenous chemical agents. Examples of exogenouschemicals include agents such as rapamycin or rapamycin analogs usefulin mammalian systems and chemicals such as salicylic acid, abscissicacid useful in plant systems. Other examples of an exogenous chemicalsignalling agent of the present invention include oligonucleotides suchas double-stranded nonhydrolyzable synthetic oligonculeotides which arerecognized by an endonuclease catalytic site encoded by the regulatableintein of the invention.

[0100] In one embodiment, the temperature sensitive mutant inteins arethose which do not undergo self-excision from the target protein attemperatures over about 29° C. In another embodiment, the cold-sensitivemutant inteins are those that do not undergo self-excision attemperatures below 18° C. Preferably, predetermined excision conditionsare experimentally determined taking into consideration temperatures atwhich the target protein will denature or undergo thermal inactivation.Examples of these conditional mutants include temperature sensitive andcold sensitive alleles of the Sce. VMA intein. The specific amino acidchanges in these alleles due to these specific mutations are listed inthe table below: TABLE 3 Condition-Sensitive Mutations Sce. VMA AlleleAmino Acid Change TS1 L212P TS4 N278T, L391S TS7 L122F, L166P, Q259R TS8D324G TS10 S150P, F155L, T233A, N247S, N284D, V450A TS15 E2K, M47V,F102L, L167S TS17 D31G, E36G, S63P, E137G, Y154C, N281S TS18 E103K,S356F TS19 W157R, L219A CS1 V451N CS2 V451T, V452G CS3 V451K, V452A

[0101] In one embodiments the condition-sensitive mutant inteins of thisinvention include a polypeptide which is encoded by a nucleotidesequence that hybridizes under stringent conditions to a nucleic acidsequence represented in one or more of SEQ ID Nos. 13, 15, 17, 19or21.

[0102] The present invention also provides probes/primers comprising asubstantially purified oligonucleotide, wherein the oligonucleotidecomprises a region of nucleotide sequence which hybridizes understringent conditions to consecutive nucleotides of sense or antisensesequence of SEQ ID Nos. 13, 15, 17, 19 or 21, or naturally occurringmutants thereof. In preferred embodiments, the probe/primer furthercomprises a label group attached thereto and able to be detected, e.g.the label group is selected from a group consisting of radioisotopes,fluorescent compounds, enzymes, and enzyme co-factors.

[0103] In another embodiment, the inteins of this invention includepolypeptide sequences comprising only the N-and C-domains, which arerequired for the efficient self-splicing of the intein. Thus, thisinvention includes inteins comprising the minimal portions required forself-splicing, for example these include inteins comprising mainly the Nand C domains together with a minimal linker, such that, the linkerprovides the flexibility required for proper protein-folding andconsequently proper intein self-splicing.

[0104] The N domain may be about 90-150 amino acids in length. In oneembodiment, the N domain is about 130 amino acids in length. In anotherembodiment, the N domain is about 100 amino acids in length. In yetanother embodiment, the N domain is about 95 amino acids in length. In apreferred embodiment, the N domain is about 90 amino acids in length.

[0105] The C domain may be at least 35-55 amino acids in length. In oneembodiment the C domain is about 50 amino acids in length. In anotherembodiment, the C domain is about 40 amino acids in length, and in apreferred embodiment, the C domain is about 35 amino acids in length.

[0106] These minimal inteins may be generated by deleting the centralregion encoding the entire endonuclease region. For example,Shingledecker et al. (Gene 207:187-195 (1998), have shown that afunctional intein was formed by the deletion of the entire endonucleasedomain from the Mycobacterium tuberculosis recA intein, wherein thedeletion resulted in an intein comprising the N and C domains togetherwith a undecapeptide spacer.

[0107] In another embodiment, this invention includes inteins whereineither the N and/or the C domains are synthesized separately andreconstituted to provide a self-splicing intein. The N and C domains mayeither be isolated and purified or may be synthesized. In addition,these domains may be from the same or different target (host)polypeptides. In one embodiment, the invention also includes within itsscope a N-extein-N-intein fragment which may be expressed in cells and aC-intein-C-extein fragment, which may be independently expressed incells, wherein interaction of the two fragments yields an full lengthN-extein-N-intein-C-intein-C-extein polypeptide product.

[0108] In another aspect, the invention also includes aN-extein-N-intein-L (ligand) fragment which may be expressed in cellsand a LBD (ligand binding domain)-C-intein-C-extein fragment, which maybe independently expressed in cells, wherein interaction between theligand and the ligand binding domains of the two fragments yields anfull length N-extein-N-intein-L-LBD-C-intein-C-extein polypeptideproduct. Examples of suitable ligands and ligand binding domains,include but are not limited to polypeptides such as FK506 bindingproteins/RAP-binding proteins, and antibody/hapten pairs. A skilledartisan can readily adapt any known protein binding domain/ligand pairfor use in the present methods. Further, as will be evident to theskilled artisan, the ligand and the ligand binding domain may beinterchangeably present on either fragment described herein.

[0109] Formation of the full length N-extein-N-intein-C-intein-C-exteinpolypeptide or the N-extein-N-intein-L-LBD-C-intein-C-extein polypeptideproduct is followed by excision of the intein to produce a functionaltarget protein.

[0110] In one aspect of this invention, either the formation of the fulllength polypeptide or the splicing of the intein after the formation ofthe full length polypeptide may be subject to exogenous regulation.

[0111] The linker used herein may be any linker which provides theflexibility required for the formation of the splicing active siterequired for proper folding of the intein to bring together the twosplice junctions, and other amino acid residues which may assist in thesplicing reaction. This linker can facilitate enhanced flexibility ofthe intein allowing the N- and C- domains to freely and (optionally)simultaneously interact by reducing steric hindrance between the twofragments, as well as allowing appropriate folding of each portion tooccur. The linker can be of natural origin, such as a sequencedetermined to exist in random coil between two domains of a protein.Alternatively, the linker can be of synthetic origin.

[0112] In one embodiment, the linker may be a peptide linker, forinstance, the linker may be a poly-glycine linker, or a linkercontaining Asn-Gly repeats, or Gly-Ser repeats. In a preferredembodiment the linker is a (Gly4Ser)3 sequence. Peptide linkers may bebetween about 5-50 amino acids, more preferably the linker is 5-30 aminoacids in length and most preferably the linker is 6-20 amino acidresidues in length. Linkers of this type are described in Huston et al.(1988) PNAS 85:4879; and U.S. Pat. Nos. 5,091,513 and 5,258,498.Naturally occurring unstructured linkers of human origin are preferredas they reduce the risk of immunogenicity.

[0113] This invention further contemplates a method for generating setsof combinatorial mutants of the subject intein proteins as well astruncation mutants, and is especially useful for identifying potentialvariant sequences (e.g., homologs). The purpose of screening suchcombinatorial libraries is to generate, for example, novel conditionalintein equivalents which can be used in the method of the presentinvention. For example, the combinatorially-derived homologs can begenerated to have an increased sensitivity of regulation relative to agiven intein conditional allele. Alternatively, thecombinatorially-derived conditional intein homolog may correspond to analtered nucleic acid sequence which, for example, facilitates cloninginto a target gene or which alters codon utilization to correspond to amore preferred set of codons for a given organism in which the regulatedtarget gene is to be expressed (for review of organismal codon bias seee.g. Sharp et al. (1988) Nucleic Acids Res. 16: 8207-11).

[0114] In one embodiment, the variegated library of intein variants isgenerated by combinatorial mutagenesis at the nucleic acid level, and isencoded by a variegated gene library. For instance, a mixture ofsynthetic oligonucleotides can be enzymatically ligated into genesequences such that the degenerate set of potential Intein sequences areexpressible as individual polypeptides, or alternatively, as a set oflarger fusion proteins (e.g., for phage display) containing the set ofintein sequences therein.

[0115] There are many ways by which such libraries of potential inteinhomologs can be generated from a degenerate oligonucleotide sequence.Chemical synthesis of a degenerate gene sequence can be carried out inan automatic DNA synthesizer, and the synthetic genes then ligated intoan appropriate expression vector. The purpose of a degenerate set ofgenes is to provide, in one mixture, all of the sequences encoding thedesired set of potential Intein sequences. The synthesis of degenerateoligonucleotides is well known in the art (see for example, Narang, S A(1983) Tetrahedron 39:3; Itakura et al. (1981) Recombinant DNA, Proc3^(rd) Cleveland Sympos. Macromolecules, ed. A G Walton, Amsterdam:Elsevier pp 273-289; Itakura et al. (1984) Annu. Rev. Biochem. 53:323;Itakura et al. (1984) Science 198:1056; Ike et al. (1983) Nucleic AcidRes. 11:477. Such techniques have been employed in the directedevolution of other proteins (see, for example, Scott et al. (1990)Science 249:386-390; Roberts et al. (1992) PNAS 89:2429-2433; Devlin etal. (1990) Science 249: 404-406; Cwirla et al. (1990) PNAS 87:6378-6382; as well as U.S. Pat. Nos. 5,223,409, 5,198,346, and5,096,815).

[0116] Likewise, a library of coding sequence fragments can be providedfor an intein clone in order to generate a variegated population ofintein fragments for screening and subsequent selection of bioactivefragments. A variety of techniques are known in the art for generatingsuch libraries, including chemical synthesis. In one embodiment, alibrary of coding sequence fragments can be generated by (i) treating adouble stranded PCR fragment of an intein coding sequence with anuclease under conditions wherein nicking occurs only about once permolecule; (ii) denaturing the double stranded DNA; (iii) renaturing theDNA to form double stranded DNA which can include sense/antisense pairsfrom different nicked products; (iv) removing single stranded portionsfrom reformed duplexes by treatment with S1 nuclease; and (v) ligatingthe resulting fragment library into an expression vector. By thisexemplary method, an expression library can be derived which codes forN-terminal, C-terminal and internal fragments of various sizes.

[0117] A wide range of techniques are known in the art for screeninggene products of combinatorial libraries made by point mutations ortruncation, and for screening cDNA libraries for gene products having acertain property. Such techniques will be generally adaptable for rapidscreening of the gene libraries generated by the combinatorialmutagenesis of intein homologs. The most widely used techniques forscreening large gene libraries typically comprises cloning the genelibrary into replicable expression vectors, transforming appropriatecells with the resulting library of vectors, and expressing thecombinatorial genes under conditions in which detection of a desiredactivity facilitates relatively easy isolation of the vector encodingthe gene whose product was detected. Each of the illustrative assaysdescribed below are amenable to high through-put analysis as necessaryto screen large numbers of degenerate intein sequences created bycombinatorial mutagenesis techniques. Combinatorial mutagenesis has apotential to generate very large libraries of mutant proteins, e.g., inthe order of 1026 molecules. Combinatorial libraries of this size may betechnically challenging to screen even with high throughput screeningassays. To overcome this problem, a new technique has been developedrecently, recrusive ensemble mutagenesis (REM), which allows one toavoid the very high proportion of non-functional proteins in a randomlibrary and simply enhances the frequency of functional proteins, thusdecreasing the complexity required to achieve a useful sampling ofsequence space. REM is an algorithm which enhances the frequency offunctional mutants in a library when an appropriate selection orscreening method is employed (Arkin and Yourvan, 1992, PNAS USA89:7811-7815; Yourvan et al., 1992, Parallel Problem Solving fromNature, 2., In Maenner and Manderick, eds., Elsevir Publishing Co.,Amsterdam, pp. 401-410; Delgrave et al., 1993, Protein Engineering6(3):327-331).

4.4. Modification of Target Genes and Polypeptides

[0118] The invention provides methods by which a target polypeptidewhich encodes at least one bioactivity can be modified by the insertionof a regulatable intein such that the bioactivity becomes controllableby regulating the excision of the regulatable intein. We provide hereinspecific examples in which a target polypeptide, selected by virtue ofits encoded bioactivity, is modified by the insertion of such aregulatable intein sequence (see Examples). General considerations to bemade by the skilled artisan when engineering the targetpolypeptide::intein hybrid are discussed below. Further minorconsiderations will be obvious to those of skill in the art.

[0119] The sequence of naturally occurring intein containing genesequences, along with various mechanistic studies on intein excision,provides guidance for the modification of a target polypeptide with aregulatable intein. For example, the inserted intein open reading frame(ORF) must be “in frame” with the target polypeptide at the point ofinsertion in order that a full-length target polypeptide::intein of thegeneral structure N-Extein target polypeptide-intein-C-extein targetpolypeptide can be made. The reading frame must be retained across boththe N-extein/intein junction and the intein/C-extein junction.

[0120] Alternatively, two separate hybrid polypeptides corresponding toa first N-Extein target polypeptide-N-terminal-intein polypeptide and asecond C-terminal-intein-C-terminal-extein polypeptide can be engineeredso that regulatable trans-splicing auto-excision event results in thejoining of the N-Extein and C-Extein polypeptide segments to produce atrans-spliced target polypeptide. In this embodiment, theN-extein/intein junction and the intein/C-extein junction are eachengineered separately, but nevertheless must each be made to retain theexisting reading frame across each polypeptide junction.

[0121] A second consideration for the site of insertion into the targetpolypeptide of the regulatable intein sequence is selection of a siteadjacent to a target polypeptide hydroxyl or thiol moiety such asprovided by the amino acid side chain of a serine, threonine or cysteineresidue. Polypeptide sequence alignments of naturally-occurringintein-containing gene products reveals the existence of a conservedserine, threonine or cysteine at the site of insertion into the hostprotein (Perler F B, et al. (1997) Nucleic Acids Res. 25:1087-93).Furthermore, mutagenesis of this conserved serine, threonine or cysteineat the intein-C-extein junction resulted in loss of intein autoexcisionactivity (Hirata et al. (1992) Biochem. Biophys. Res. Commun. 188:40-47; Cooper et al. (1993) EMBO J 12: 2575-83; Davis et al. (1992) Cell71, 201-10). Certain studies have suggested that the identity of theamino-terminal residue of the intein, which is also a conserved serine,threonine or cysteine, should match that of this conserved aminoterminal residue of the C-extein- particularly when the amino-terminalintein residue is a cysteine (Chong et al. (1996) J Biol. Chem. 271:22159-68). Therefore, in preferred embodiments, the conditional inteinpolypeptide is inserted upstream (amino-terminal) to a cysteine, serineor threonine, the identity of which matches that of the amino-terminalresidue of the selected intein. This limitation on the site of inteininsertion into the host polypeptide should not prove limiting however,as serine, threonine and cysteine collectively account for well over tenpercent of the total amino acid composition of a number ofrepresentative proteins (Lehninger (1976) Worth Publishers, Inc., p.101). Therefore, by selection of an appropriate conditional intein,virtually any target polypeptide can be modified an endogenous serine,threonine or cysteine residue to yield a target polypeptide::inteinhybrid gene product from which, under appropriate conditions, theendogenous auto-excision activity of the intein can be activated and theinserted intein sequence thereby excised from the target polypeptide.Furthermore, in order for the inserted conditional intein to exertcontrol of a bioactivity of the target polypeptide, in preferredembodiments, the site of insertion of the intein polypeptide must beselected so as to interfere with the bioactivity when the intein ispresent in the target::intein hybrid. Guidance in constructing such ahybrid are provided above.

[0122] In certain specialized embodiments of the invention, the targetpolypeptide encodes a bioactivity which is partially or completelyinactive in the absence of an inserted intein. Such target polypeptidesmay correspond, for example, to the fusion of two polypeptides whichinteract with one another to produce a measurable bioactivity but whichare fused in such close proximity (e.g. directly abutting thepolypeptide domains or fusing them with only a short linker polypeptide)as to cause a steric inhibition of their interaction. In this particularinstance, the insertion of an heterologous regulatable intein sequencebetween the two domains causes an increase in the bioactivity resultingfrom the appropriate and sterically proper interaction of the two targetpolypeptides. This particular embodiment of the invention allows for theregulation of the target polypeptide in a manner opposite that of thepreferred embodiment discussed above—that is, signals which increase theself-excision of the inserted intein (such as intein self-excisionagonist compounds) actually decrease the target polypeptide bioactivitywhereas signals which decrease the self-excision of the inserted intein(such as intein self-excision antagonist compounds) actually increasethe target polypeptide bioactivity.

4.5. Methods of Preparing Target:Intein Hybrid Polypeptides

[0123] The Intein-target hybrids may be prepared by the methods whichare well known in the art. The method contemplates both in vivo and invitro methods for creating these hybrids. In preferred embodiments anucleic acid encoding a regulatable intein is inserted into a nucleicacid which encodes a target polypeptide as shown in FIG. 2. Generalcloning techniques (see e.g. Sambrook et al. (1989) Molecular Cloning: ALaboratory Manual (Cold Spring Harbor Press)) can be used in the methodof the invention to obtain suitable target gene:intein hybrid nucleicacids of the invention. The invention provides other techniquesparticularly well suited to the insertion of the regulatableintein-encoding nucleic acid sequence into the targetpolypeptide-encoding nucleic acid sequence while retaining the correctreading frame of the target gene at both the upstream and downstreaminsertion junctions. Attention to the reading frame of the target geneallows recombinant production of the target polypeptide:Intein hybridpolypeptide.

[0124] For example, in one aspect, the method includes a PCR-basedapproach called splicing by overlap extension (SOE) which is notsequence-dependent and does not depend on the occurrence of restrictionenzyme recognition sequences at the recombination site. Gene splicing byoverlap extension is an effective way for recombining DNA molecules atprecise junctions irrespective of nucleotide sequences at therecombination site and without the use of restriction endonucleases orligase. Fragments from the genes that are to be recombined are generatedin separate polymerase chain reactions (PCRs). The primers are designedso that the ends of the products contain complementary sequences. Whenthese PCR products are mixed, denatured, and reannealed, the strandshaving the matching sequences at their 3′ ends overlap and act asprimers for each other. Extension of this overlap by DNA polymeraseproduces a molecule in which the original sequences are ‘spliced’together. This technique is used to construct a gene encoding a mosaicprotein comprised of an intein and a target polypeptide.

[0125] In certain situations, the SOE method of recombining genesequences is a significant improvement over standard techniques. Thismethod is particularly useful when sequences must be precisely joinedwithin a very limited region. In addition to being an improved methodfor recombining DNA, SOE allows site-directed mutagenesis to beperformed simultaneously with recombination. The product in a SOEreaction is a mosaic of natural sequences connected by syntheticregions, and the sequence of these synthetic regions is entirely at thediscretion of the genetic engineer.

4.6. Agonist and Antagonist Signals of the Invention

[0126] The invention further provides signals which are used to regulatethe self-excision activity of an intein polypeptide. In general, theselection of a signal is predicated upon the nature of the intein to beregulated. For example, self-excision of the temperature-sensitiveconditional inteins can be antagonized by increasing the temperature,while self-excision of the cold-sensitive conditional inteins can beantagonized by decreasing the temperature. In contrast, thetrans-spliced regulatable inteins described herein can be agonized bythe addition of an exogenous chemical dimerizer such as rapamycin. Eachof these examples entail the use of a genetically modified intein,however the invention provides methods by which an intein which has notbeen genetically modified can be regulated by means of an appropriateagonist or antagonist signal.

[0127] For example, many naturally-occurring inteins frequently encode ahoming endonuclease activity which recognizes and cleave at a nucleicacid sequence adjacent to the site of its insertion into the host gene.This cleavage event initiates a series of recombinogenic events whichcan effect the “mobilization” of the intein-encoding sequence. Thenucleic acid sequence recognized by the homing endonuclease can thus beidentified from the nucleic acid sequence surrounding this junction (seee.g., Nishioka, et al. (1998) Nucleic Acids Res. 26: 4409-12). Thereforea double-stranded oligonucleotide which comprises the minimalrecognition sequence for such an endonuclease will therefore bind to atarget:intein hybrid polypeptide which carries this endonucleasefunction. This provides for a readily-identifiable high affinity ligandfor use in directly or indirectly regulating an intein self-excisionactivity. For example, a nonhydrolyzable synthetic oligonucleotide whichbinds tightly to the intein endonuclease catalytic site but does notundergo hydrolytic chain breakage can be used to antagonize an inteinself-excision reaction. Preferably, such a nonhydrolyzable substrate isdesigned to mimic a substrate transition state which occurs duringcatalysis. Such transition state analogs frequently bind with extremelyhigh affinities to the corresponding catalytic site and therby inhibitcatalysis of the natural substrates. In some embodiments, the formationof an oligonucleotide/intein-endonuclease complex prevents self-excisionof the intein from the target polypeptide. In these instances, thesynthetic oligonucleotide alone can serve as a signaling agent in themethod of the invention. In preferred embodiments, the syntheticoligonucleotide is further modified to include one or more activitieswhich serve to agonize or antagonize the self-excision of the intein.For example, self-excision can be readily antagonized by addition ofchemically active amino acid crosslinking groups which, in preferredembodiments, recognize one or more of the amino acid side groups whichfunction in the intein self-excision reaction.

[0128] Still other signals of the invention include those which can beidentified by routine screening for chemical ligands or inhibitors ofintein self-excision using appropriate high-throughput screeningtechniques.

4.7. Nucleic Acid Compositions

[0129] In another aspect of the invention, the proteins described hereinare provided in expression vectors. For instance, expression vectors arecontemplated which include a nucleotide sequence encoding a polypeptidecontaining a composite activator of the present invention, which codingsequence is operably linked to at least one transcriptional regulatorysequence. Regulatory sequences for directing expression of the instantfusion proteins are art-recognized and are selected by a number of wellunderstood criteria. Exemplary regulatory sequences are described inGoeddel; Gene Expression Technology: Methods in Enzymology, AcademicPress, San Diego, Calif. (1990). For instance, any of a wide variety ofexpression control sequences that control the expression of a DNAsequence when operatively linked to it may be used in these vectors toexpress DNA sequences encoding the fusion proteins of this invention.Such useful expression control sequences, include, for example, theearly and late promoters of SV40, adenovirus or cytomegalovirusimmediate early promoter, the lac system, the trp system, the TAC or TRCsystem, T7 promoter whose expression is directed by T7 RNA polymerase,the promoter for 3-phosphoglycerate kinase or other glycolytic enzymes,the promoters of acid phosphatase, e.g., Pho5, and the promoters of theyeast ÿ-mating factors and other sequences known to control theexpression of genes of prokaryotic or eukaryotic cells or their viruses,and various combinations thereof. It should be understood that thedesign of the expression vector may depend on such factors as the choiceof the host cell to be transformed. Moreover, the vector's copy number,the ability to control that copy number and the expression of any otherprotein encoded by the vector, such as antibiotic markers, should alsobe considered.

[0130] As will be apparent, the subject gene constructs can be used tocause expression of the subject fusion proteins in cells propagated inculture, e.g. to produce proteins or polypeptides, including fusionproteins, for purification.

[0131] This invention also pertains to a host cell transfected with arecombinant gene in order to express one of the subject polypeptides.The host cell may be any prokaryotic or eukaryotic cell. For example, afusion proteins of the present invention may be expressed in bacterialcells such as E. coli, insect cells (baculovirus), yeast, or mammaliancells. Other suitable host cells are known to those skilled in the art.

[0132] Accordingly, the present invention further pertains to methods ofproducing the subject fusion proteins—e.g., the targetpolypeptide:intein chimeric polypeptides described herein. For example,a host cell transfected with an expression vector encoding a protein ofinterest can be cultured under appropriate conditions to allowexpression of the protein to occur. The protein may be secreted, byinclusion of a secretion signal sequence, and isolated from a mixture ofcells and medium containing the protein. Alternatively, the protein maybe retained cytoplasmically and the cells harvested, lysed and theprotein isolated. A cell culture includes host cells, media and otherbyproducts. Suitable media for cell culture are well known in the art.The proteins can be isolated from cell culture medium, host cells, orboth using techniques known in the art for purifying proteins, includingion-exchange chromatography, gel filtration chromatography,ultrafiltration, electrophoresis, and immunoaffinity purification withantibodies specific for particular epitopes of the protein.

[0133] Thus, a coding sequence for a fusion protein of the presentinvention can be used to produce a recombinant form of the protein viamicrobial or eukaryotic cellular processes. Ligating the polynucleotidesequence into a gene construct, such as an expression vector, andtransforming or transfecting into hosts, either eukaryotic (yeast,avian, insect or mammalian) or prokaryotic (bacterial cells), arestandard procedures.

[0134] Expression vehicles for production of a recombinant proteininclude plasmids and other vectors. For instance, suitable vectors forthe expression of the instant fusion proteins include plasmids of thetypes: pBR322-derived plasmids, pEMBL-derived plasmids, pEX-derivedplasmids, pBTac-derived plasmids and pUC-derived plasmids for expressionin prokaryotic cells, such as E. coli.

[0135] A number of vectors exist for the expression of recombinantproteins in yeast. For instance, YEP24, YIP5, YEP51, YEP52, pYES2, andYRP17 are cloning and expression vehicles useful in the introduction ofgenetic constructs into S. cerevisiae (see, for example, Broach et al.,(1983) in Experimental Manipulation of Gene Expression, ed. M. InouyeAcademic Press, p. 83, incorporated by reference herein). These vectorscan replicate in E. coli due the presence of the pBR322 ori, and in S.cerevisiae due to the replication determinant of the yeast 2 micronplasmid. In addition, drug resistance markers such as ampicillin can beused.

[0136] The preferred mammalian expression vectors contain bothprokaryotic sequences to facilitate the propagation of the vector inbacteria, and one or more eukaryotic transcription units that areexpressed in eukaryotic cells. The pcDNAI/amp, pcDNAI/neo, pRc/CMV,pSV2gpt, pSV2neo, pSV2-dhfr, pTk2, pRSVneo, pMSG, pSVT7, pko-neo andpHyg derived vectors are examples of mammalian expression vectorssuitable for transfection of eukaryotic cells. Some of these vectors aremodified with sequences from bacterial plasmids, such as pBR322, tofacilitate replication and drug resistance selection in both prokaryoticand eukaryotic cells. Alternatively, derivatives of viruses such as thebovine papilloma virus (BPV-1), or Epstein-Barr virus (pHEBo,pREP-derived and p205) can be used for transient expression of proteinsin eukaryotic cells. Examples of other viral (including retroviral)expression systems can be found below in the description of gene therapydelivery systems. The various methods employed in the preparation of theplasmids and transformation of host organisms are well known in the art.For other suitable expression systems for both prokaryotic andeukaryotic cells, as well as general recombinant procedures, seeMolecular Cloning: A Laboratory Manual, 2nd Ed., ed. by Sambrook,Fritsch and Maniatis (Cold Spring Harbor Laboratory Press, 1989)Chapters 16 and 17. In some instances, it may be desirable to expressthe recombinant fusion proteins by the use of a baculovirus expressionsystem. Examples of such baculovirus expression systems includepVL-derived vectors (such as pVL1392, pVL1393 and pVL941), pAcUW-derivedvectors (such as pAcUW1), and pBlueBac-derived vectors (such as theβ-gal containing pBlueBac III).

[0137] In yet other embodiments, the subject expression constructs arederived by insertion of the subject gene into viral vectors includingrecombinant retroviruses, adenovirus, adeno-associated virus, and herpessimplex virus-1, or recombinant bacterial or eukaryotic plasmids. Asdescribed in greater detail below, such embodiments of the subjectexpression constructs are specifically contemplated for use in variousin vivo and ex vivo gene therapy protocols.

[0138] Retrovirus vectors and adeno-associated virus vectors aregenerally understood to be the recombinant gene delivery system ofchoice for the transfer of exogenous genes in vivo, particularly intohumans. These vectors provide efficient delivery of genes into cells,and the transferred nucleic acids are stably integrated into thechromosomal DNA of the host. A major prerequisite for the use ofretroviruses is to ensure the safety of their use, particularly withregard to the possibility of the spread of wild-type virus in the cellpopulation. The development of specialized cell lines (termed “packagingcells”) which produce only replication-defective retroviruses hasincreased the utility of retroviruses for gene therapy, and defectiveretroviruses are well characterized for use in gene transfer for genetherapy purposes (for a review see Miller, A. D. (1990) Blood 76:271).Thus, recombinant retrovirus can be constructed in which part of theretroviral coding sequence (gag, pol, env) has been replaced by nucleicacid encoding a fusion protein of the present invention, e.g., acomposite activator, rendering the retrovirus replication defective. Thereplication defective retrovirus is then packaged into virions which canbe used to infect a target cell through the use of a helper virus bystandard techniques. Protocols for producing recombinant retrovirusesand for infecting cells in vitro or in vivo with such viruses can befound in Current Protocols in Molecular Biology, Ausubel, F. M. et al.,(eds.) Greene Publishing Associates, (1989), Sections 9.10-9.14 andother standard laboratory manuals. Examples of suitable retrovirusesinclude pLJ, pZIP, pWE and pEM which are well known to those skilled inthe art. Examples of suitable packaging virus lines for preparing bothecotropic and amphotropic retroviral systems include ÿCrip, ÿCre, ÿ2 andÿAm. Retroviruses have been used to introduce a variety of genes intomany different cell types, including neural cells, epithelial cells,endothelial cells, lymphocytes, myoblasts, hepatocytes, bone marrowcells, in vitro and/or in vivo (see for example Eglitis et al., (1985)Science 230:1395-1398; Danos and Mulligan, (1988) PNAS USA 85:6460-6464;Wilson et al., (1988) PNAS USA 85:3014-3018; Armentano et al., (1990)PNAS USA 87:6141-6145; Huber et al., (1991) PNAS USA 88:8039-8043; Ferryet al., (1991) PNAS USA 88:8377-8381; Chowdhury et al., (1991) Science254:1802-1805; van Beusechem et al., (1992) PNAS USA 89:7640-7644; Kayet al., (1992) Human Gene Therapy 3:641-647; Dai et al., (1992) PNAS USA89:10892-10895; Hwu et al., (1993) J. Immunol. 150:4104-4115; U.S. Pat.Nos. 4,868,116; 4,980,286; PCT Application WO 89/07136; PCT ApplicationWO 89/02468; PCT Application WO 89/05345; and PCT Application WO92/07573).

[0139] Furthermore, it has been shown that it is possible to limit theinfection spectrum of retroviruses and consequently of retroviral-basedvectors, by modifying the viral packaging proteins on the surface of theviral particle (see, for example PCT publications WO93/25234,WO94/06920, and WO94/11524). For instance, strategies for themodification of the infection spectrum of retroviral vectors include:coupling antibodies specific for cell surface antigens to the viral envprotein (Roux et al., (1989) PNAS USA 86:9079-9083; Julan et al., (1992)J. Gen Virol 73:3251-3255; and Goud et al., (1983) Virology163:251-254); or coupling cell surface ligands to the viral env proteins(Neda et al., (1991) J. Biol. Chem. 266:14143-14146). Coupling can be inthe form of the chemical cross-linking with a protein or other variety(e.g. lactose to convert the env protein to an asialoglycoprotein), aswell as by generating fusion proteins (e.g. single-chain antibody/envfusion proteins). This technique, while useful to limit or otherwisedirect the infection to certain tissue types, and can also be used toconvert an ecotropic vector in to an amphotropic vector.

[0140] Another viral gene delivery system useful in the presentinvention utilizes adenovirus-derived vectors. The genome of anadenovirus can be manipulated such that it encodes a gene product ofinterest, but is inactivate in terms of its ability to replicate in anormal lytic viral life cycle (see, for example, Berkner et al., (1988)BioTechniques 6:616; Rosenfeld et al., (1991) Science 252:431-434; andRosenfeld et al., (1992) Cell 68:143-155). Suitable adenoviral vectorsderived from the adenovirus strain Ad type 5 dl324 or other strains ofadenovirus (e.g., Ad2, Ad3, Ad7 etc.) are well known to those skilled inthe art. Recombinant adenoviruses can be advantageous in certaincircumstances in that they are not capable of infecting nondividingcells and can be used to infect a wide variety of cell types, includingairway epithelium (Rosenfeld et al., (1992) cited supra), endothelialcells (Lemarchand et al., (1992) PNAS USA 89:6482-6486), hepatocytes(Herz and Gerard, (1993) PNAS USA 90:2812-2816) and muscle cells(Quantin et al., (1992) PNAS USA 89:2581-2584). Furthermore, the virusparticle is relatively stable and amenable to purification andconcentration, and as above, can be modified so as to affect thespectrum of infectivity. Additionally, introduced adenoviral DNA (andforeign DNA contained therein) is not integrated into the genome of ahost cell but remains episomal, thereby avoiding potential problems thatcan occur as a result of insertional mutagenesis in situations whereintroduced DNA becomes integrated into the host genome (e.g., retroviralDNA). Moreover, the carrying capacity of the adenoviral genome forforeign DNA is large (up to 8 kilobases) relative to other gene deliveryvectors (Berkner et al., supra; Haj-Ahmand and Graham (1986) J. Virol.57:267). Most replication-defective adenoviral vectors currently in useand therefore favored by the present invention are deleted for all orparts of the viral E1 and E3 genes but retain as much as 80% of theadenoviral genetic material (see, e.g., Jones et al., (1979) Cell16:683; Berkner et al., supra; and Graham et al., in Methods inMolecular Biology, E. J. Murray, Ed. (Humana, Clifton, N.J., 1991) vol.7. pp. 109-127). Expression of the inserted chimeric gene can be undercontrol of, for example, the E1A promoter, the major late promoter (MLP)and associated leader sequences, the viral E3 promoter, or exogenouslyadded promoter sequences.

[0141] Yet another viral vector system useful for delivery of thesubject chimeric genes is the adeno-associated virus (AAV).Adeno-associated virus is a naturally occurring defective virus thatrequires another virus, such as an adenovirus or a herpes virus, as ahelper virus for efficient replication and a productive life cycle. (Fora review, see Muzyczka et al., Curr. Topics in Micro. and Immunol.(1992) 158:97-129). It is also one of the few viruses that may integrateits DNA into non-dividing cells, and exhibits a high frequency of stableintegration (see for example Flotte et al., (1992) Am. J. Respir. Cell.Mol. Biol. 7:349-356; Samulski et al., (1989) J. Virol. 63:3822-3828;and McLaughlin et al., (1989) J. Virol. 62:1963-1973). Vectorscontaining as little as 300 base pairs of AAV can be packaged and canintegrate. Space for exogenous DNA is limited to about 4.5 kb. An AAVvector such as that described in Tratschin et al., (1985) Mol. Cell.Biol. 5:3251-3260 can be used to introduce DNA into cells. A variety ofnucleic acids have been introduced into different cell types using AAVvectors (see for example Hermonat et al., (1984) PNAS USA 81:6466-6470;Tratschin et al., (1985) Mol. Cell. Biol. 4:2072-2081; Wondisford etal., (1988) Mol. Endocrinol. 2:32-39; Tratschin et al., (1984) J. Virol.51:611-619; and Flotte et al., (1993) J. Biol. Chem. 268:3781-3790).

[0142] Other viral vector systems that may have application in genetherapy have been derived from herpes virus, vaccinia virus, and severalRNA viruses. In particular, herpes virus vectors may provide a uniquestrategy for persistence of the recombinant gene in cells of the centralnervous system and ocular tissue (Pepose et al., (1994) InvestOphthalmol Vis Sci 35:2662-2666) In addition to viral transfer methods,such as those illustrated above, non-viral methods can also be employedto cause expression of a protein in the tissue of an animal. Mostnonviral methods of gene transfer rely on normal mechanisms used bymammalian cells for the uptake and intracellular transport ofmacromolecules. In preferred embodiments, non-viral gene deliverysystems of the present invention rely on endocytic pathways for theuptake of the gene by the targeted cell. Exemplary gene delivery systemsof this type include liposomal derived systems, poly-lysine conjugates,and artificial viral envelopes.

[0143] In a representative embodiment, a gene encoding a compositeactivator can be entrapped in liposomes bearing positive charges ontheir surface (e.g., lipofectins) and (optionally) which are tagged withantibodies against cell surface antigens of the target tissue (Mizuno etal., (1992) No Shinkei Geka 20:547-551; PCT publication W091/06309;Japanese patent application 1047381; and European patent publicationEP-A-43075). For example, lipofection of neuroglioma cells can becarried out using liposomes tagged with monoclonal antibodies againstglioma-associated antigen (Mizuno et al., (1992) Neurol. Med. Chir.32:873-876).

[0144] In yet another illustrative embodiment, the gene delivery systemcomprises an antibody or cell surface ligand which is cross-linked witha gene binding agent such as poly-lysine (see, for example, PCTpublications WO93/04701, WO92/22635, WO92/20316, WO92/19749, andWO92/06180). For example, any of the subject gene constructs can be usedto transfect specific cells in vivo using a soluble polynucleotidecarrier comprising an antibody conjugated to a polycation, e.g.poly-lysine (see U.S. Pat. No. 5,166,320). It will also be appreciatedthat effective delivery of the subject nucleic acid constructs via-mediated endocytosis can be improved using agents which enhance escapeof the gene from the endosomal structures. For instance, wholeadenovirus or fusogenic peptides of the influenza HA gene product can beused as part of the delivery system to induce efficient disruption ofDNA-containing endosomes (Mulligan et al., (1993) Science 260-926;Wagner et al., (1992) PNAS USA 89:7934; and Christiano et al., (1993)PNAS USA 90:2122).

[0145] In clinical settings, the gene delivery systems can be introducedinto a patient by any of a number of methods, each of which is familiarin the art.

[0146] For instance, a pharmaceutical preparation of the gene deliverysystem can be introduced systemically, e.g. by intravenous injection,and specific transduction of the construct in the target cells occurspredominantly from specificity of transfection provided by the genedelivery vehicle, cell-type or tissue-type expression due to thetranscriptional regulatory sequences controlling expression of the gene,or a combination thereof. In other embodiments, initial delivery of therecombinant gene is more limited with introduction into the animal beingquite localized. For example, the gene delivery vehicle can beintroduced by catheter (see U.S. Pat. No. 5,328,470) or by stereotacticinjection (e.g. Chen et al., (1994) PNAS USA 91: 3054-3057).

[0147] In some embodiments of the invention, the target gene to beregulated by the regulatable intein is an endogenous gene, whichcontains an exogenous regulatable intein sequence. The exogenousregulatable intein sequence can be inserted into the endogenous gene'scoding sequence. In certain embodiments, the endogenous target gene is aDNA binding protein, capable of binding with high affinity andspecificity to a target sequence. In a preferred embodiment, the DNAbinding protein is human. However, the DNA binding protein can be fromany other species. For example, the DNA binding protein can be from theyeast GAL4 protein.

[0148] In other embodiments, the target gene to be regulated by theregulatable intein is an exogenous gene. In some embodiments, theexogenous gene is integrated into the chromosomal DNA of a cell. Theexogenous gene can be inserted into the chromosomal DNA, or theexogenous gene can substitute for at least a portion of an endogenousgene. Alternatively, the exogenous gene can be present on anextrachromosomal DNA element, such as a plasmid or a viral vector. Thetarget gene can be present in a single copy or in multiple copies. Inview of the experimental results described herein, it is not necessarythat the target gene be present in more than one copy. However, if evenhigher levels of protein encoded by the target gene is desired, multiplecopies of the gene can be used.

[0149] A wide variety of genes can be employed as the target gene,including genes that encode a therapeutic protein. The target gene canbe any sequence of interest which provides a desired phenotype. It canencode a surface membrane protein, a secreted protein, a cytoplasmicprotein, or there can be a plurality of target genes encoding differentproducts. The proteins which are expressed, singly or in combination,can involve homing, cytotoxicity, proliferation, immune response,inflammatory response, clotting or dissolving of clots, hormonalregulation, etc. The proteins expressed may be naturally-occurringproteins, mutants of naturally-occurring proteins, unique sequences, orcombinations thereof.

[0150] Various secreted products include hormones, such as insulin,human growth hormone, glucagon, pituitary releasing factor, ACTH,melanotropin, relaxin, etc.; growth factors, such as EGF, IGF-1, TGF-ÿ,-ÿ, PDGF, G-CSF, M-CSF, GM-CSF, FGF, erythropoietin, thrombopoietin,megakaryocytic stimulating and growth factors, etc.; interleukins, suchas IL-1 to -13; TNF-ÿ and -ÿ, etc.; and enzymes and other factors, suchas tissue plasminogen activator, members of the complement cascade,performs, superoxide dismutase, coagulation factors, antithrombin-III,Factor VIIIc, Factor VIIIvW, Factor IX, ÿ-antitrypsin, proteinC,proteinS, endorphins, dynorphin, bone morphogenetic protein, CFTR, etc.

[0151] The gene can encode a naturally-occurring surface membraneprotein or a protein made so by introduction of an appropriate signalpeptide and transmembrane sequence. Various such proteins include homingreceptors, e.g. L-selectin (Mel-14), blood-related proteins,particularly having a kringle structure, e.g. Factor VIIIc, FactorVIIIvW, hematopoietic cell markers, e.g. CD3, CD4, CD8, Bcell receptor,TCR subunits ÿ, ÿ, ÿ, ÿ, CD10, CD19, CD28, CD33, CD38, CD41, etc.,receptors such as the interleukin receptors IL-2R, IL-4R, etc., channelproteins, for influx or efflux of ions, e.g. H+, Ca+2, K+, Na+, Cl−,etc., and the like; CFTR, tyrosine activation motif, zap-70, etc.

[0152] Proteins may be modified for transport to a vesicle forexocytosis. By adding the sequence from a protein which is directed tovesicles, where the sequence is modified proximal to one or the otherterminus, or situated in an analogous position to the protein source,the modified protein will be directed to the Golgi apparatus forpackaging in a vesicle. This process in conjunction with the presence ofthe chimeric proteins for exocytosis allows for rapid transfer of theproteins to the extracellular medium and a relatively high localizedconcentration.

[0153] Also, intracellular proteins can be of interest, such as proteinsin metabolic pathways, regulatory proteins, steroid receptors,transcription factors, etc., depending upon the nature of the host cell.Some of the proteins indicated above can also serve as intracellularproteins.

[0154] By way of further illustration, in T-cells, one may wish tointroduce genes encoding one or both chains of a T-cell receptor. ForB-cells, one could provide the heavy and light chains for animmunoglobulin for secretion. For cutaneous cells, e.g. keratinocytes,particularly stem cells keratinocytes, one could provide for protectionagainst infection, by secreting ÿ-, ÿ- or ÿ-interferon, antichemotacticfactors, proteases specific for bacterial cell wall proteins, etc.

[0155] In addition to providing for expression of a gene havingtherapeutic value, there will be many situations where one may wish todirect a cell to a particular site. The site can include anatomicalsites, such as lymph nodes, mucosal tissue, skin, synovium, lung orother internal organs or functional sites, such as clots, injured sites,sites of surgical manipulation, inflammation, infection, etc. Byproviding for expression of surface membrane proteins which will directthe host cell to the particular site by providing for binding at thehost target site to a naturally-occurring epitope, localizedconcentrations of a secreted product can be achieved. Proteins ofinterest include homing receptors, e.g. L-selectin, GMP140, CLAM-1,etc., or addressing, e.g. ELAM-1, PNAd, LNAd, etc., clot bindingproteins, or cell surface proteins that respond to localized gradientsof chemotactic factors. There are numerous situations where one wouldwish to direct cells to a particular site, where release of atherapeutic product could be of great value.

[0156] For use in gene therapy, the target gene can encode any geneproduct that is beneficial to a subject. The gene product can be asecreted protein, a membraneous protein, or a cytoplasmic protein.Preferred secreted proteins include growth factors, differentiationfactors, cytokines, interleukins, tPA, and erythropoietin. Preferredmembraneous proteins include receptors, e.g, growth factor or cytokinereceptors or proteins mediating apoptosis, e.g., Fas receptor. Othercandidate therapeutic genes are disclosed in PCT/US93/01617.

[0157] In yet another embodiment, a “gene activation” construct which,by homologous recombination with a genomic DNA, alters thetranscriptional regulatory sequences of an endogenous gene, can be usedto introduce recognition elements for a DNA binding activity of one ofthe subject engineered proteins. A variety of different formats for thegene activation constructs are available. See, for example, theTranskaryotic Therapies, Inc PCT publications WO93/09222, WO95/31560,WO96/2941 1, WO95/31560 and WO94/12650.

4.8. Kits

[0158] This invention further provides kits useful for the foregoingapplications. One such kit contains one or more nucleic acids encoding achimeric polypeptide comprising a target polyeptide which encodes abioactivity and a regulatable intein, which is inserted into the targetpolypeptide. The kit may further comprise an additional nucleic acidssuch as specialized vectors which contain a cloning site for insertionof a desired target gene by the practitioner. For example, a preferredkit would contain a cloning site comprising at least one restrictionsite for insertion of an N-Extein of a target polypeptide, which issupplied by the user of the kit. In preferred embodiments, the cloningsite is a polylinker. In preferred embodiments, this N-Extein cloningsite is followed by a regulatable Intein sequence. In particularlypreferred embodiments, the N-Extein cloning site of the vector is madeavailable to the user in all three possible reading frames by supplyingthree different versions of the vector corresponding to singlenucleotide insertions at the cloning site so that an in-frame fusion ofthe N-Extein to the regulatable Intein occurs. In preferred embodiments,the regulatable Intein sequence is further followed by a cloning sitefor a C-Extein element of the target sequence, which target may besupplied by the user. In still more preferred embodiments, versions ofthe vector corresponding to all three possible reading frames betweenthe regulatable intein and the C-extein are made available to the user.For regulatable applications, i.e., in cases in which the recombinantprotein contains a ligand binding domain or inducible domain, the kitmay further contain an oligomerizing agent, such as the macrolidedimerizers discussed above. Such kits may for example contain a sampleof a dimerizing agent capable of dimerizing the two recombinant proteinsand activating transcription of the target.

[0159] Constructs may be designed in accordance with the principles,illustrative examples and materials and methods disclosed in the patentdocuments and scientific literature cited herein, each of which isincorporated herein by reference, with modifications and furtherexemplification as described herein. Components of the constructs can beprepared in conventional ways, where the coding sequences and regulatoryregions may be isolated, as appropriate, ligated, cloned in anappropriate cloning host, analyzed by restriction or sequencing, orother convenient means. Particularly, using PCR, individual fragmentsincluding all or portions of a functional unit may be isolated, whereone or more mutations may be introduced using “primer repair”, ligation,in vitro mutagenesis, etc. as appropriate. In the case of DNA constructsencoding chimeric proteins, DNA sequences encoding individual domainsand sub-domains are joined such that they constitute a single openreading frame encoding a chimeric protein capable of being translated incells or cell lysates into a single polypeptide harboring all componentdomains. The DNA construct encoding the chimeric protein may then beplaced into a vector that directs the expression of the protein in theappropriate cell type(s). For biochemical analysis of the encodedchimera, it may be desirable to construct plasmids that direct theexpression of the protein in bacteria or in reticulocyte-lysate systems.For use in the production of proteins in mammalian cells, theprotein-encoding sequence is introduced into an expression vector thatdirects expression in these cells. Expression vectors suitable for suchuses are well known in the art. Various sorts of such vectors arecommercially available.

4.9. Transgenic Organisms

[0160] The invention provides transgenic plants and animals which carryone or more intein modified target genes which can be regulated. Thesetransgenic organisms can be generated with the nucleic acid targetgene:intein hybrids of the invention. For example, the invention furtherprovides for transgenic animals, which can be used for a variety ofpurposes, e.g., to study the function of a target gene. The transgenicanimals of the invention can be animals expressing a transgene encodinga target:intein hybrid protein or fragment thereof or variants thereof,including mutants and polymorphic variants thereof. These animals can beused to determine the effect of expression of a target gene protein in aspecific site or in a specific temporal window. In one aspect, theinvention features a cell or cell line, which contains a knock-in of anintein which has been inserted into a particular target gene. In apreferred embodiment, the cell or cell line is an undifferentiated cell,for example, a stem cell, embryonic stem cell, oocyte or embryonic cell.

[0161] Yet in a further aspect, the invention features a method ofproducing a non-human mammal with a targeted disruption in aninterleukin-1 gene. For example, a target gene knock-in construct can becreated with a portion of the target gene having an internal portion ofsaid target gene replaced by a marker. The knock-out construct can thenbe transfected into a population of embryonic stem m(ES) cells.Transfected cells can then be selected as expressing the marker. Thetransfected ES cells can then be introduced into an embryo of anancestor of said mammal. The embryo can be allowed to develop to term toproduce a chimeric mammal with the knock-out construct in its germline.Breeding said chimeric mammal will produce a heterozygous mammal with atargeted disruption in the target gene. Homozygotes can be generated bycrossing heterozygotes.

[0162] In another aspect, the invention features target knock-outconstructs, which can be used to generate the animals described above.In one embodiment, the target construct can comprise a portion of thetarget gene, wherein an internal portion of said target gene is replacedby a selectable marker. Preferably, the marker is the neo gene and theportion of the target gene is at least 2.5 kb long or 7.0 or 9.5 kb long(including the replaced portion and any target flanking sequences). Theinternal portion preferably covers at least a portion of an exon and insome embodiments it covers all of the exons which encode an targetpolypeptide.

[0163] Yet other non-human animals within the scope of the inventioninclude those in which the expression of the endogenous Target gene hasbeen mutated or “knocked out”. A “knock out” animal is one carrying ahomozygous or heterozygous deletion of a particular gene or genes. Theseanimals could be useful to determine whether the absence of the targetpolypeptide will result in a specific phenotype, in particular whetherthese mice have or are likely to develop a specific disease, such ashigh susceptibility to heart disease or cancer. Furthermore theseanimals are useful in screens for drugs which alleviate or attenuate thedisease condition resulting from the mutation of the target gene asoutlined below. These animals are also useful for determining the effectof a specific amino acid difference, or allelic variation, in a targetgene.

[0164] In a preferred embodiment of this aspect of the invention, atransgenic target gene knock-in mouse, carrying the mutated target locuson one or both of its chromosomes, is used as a model system fortransgenic or drug treatment of the condition resulting from loss oftarget gene expression.

[0165] Methods for obtaining transgenic and knockout non-human animalsare well known in the art. Knock out mice are generated by homologousintegration of a “knock out” construct into a mouse embryonic stem cellchromosome which encodes the gene to be knocked out. In one embodiment,gene targeting, which is a method of using homologous recombination tomodify an animal's genome, can be used to introduce changes intocultured embryonic stem cells. By targeting a specific gene of interestin ES cells, these changes can be introduced into the germlines ofanimals to generate chimeras. The gene targeting procedure isaccomplished by introducing into tissue culture cells a DNA targetingconstruct that includes a segment homologous to a target locus, andwhich also includes an intended sequence modification to the targetgenomic sequence (e.g., insertion, deletion, point mutation). Thetreated cells are then screened for accurate targeting to identify andisolate those which have been properly targeted.

[0166] Gene targeting in embryonic stem cells is in fact a schemecontemplated by the present invention as a means for disrupting a targetgene function through the use of a targeting transgene constructdesigned to undergo homologous recombination with one or more targetgenomic sequences. The targeting construct can be arranged so that, uponrecombination with an element of at gene, a positive selection marker isinserted into (or replaces) coding sequences of the gene. The insertedsequence functionally disrupts the target gene, while also providing apositive selection trait. Exemplary targeting constructs are describedin more detail below.

[0167] Generally, the embryonic stem cells (ES cells ) used to producethe knockout animals will be of the same species as the knockout animalto be generated. Thus for example, mouse embryonic stem cells willusually be used for generation of knockout mice.

[0168] Embryonic stem cells are generated and maintained using methodswell known to the skilled artisan such as those described by Doetschmanet al. (1985) J. Embryol. Exp. MoIBRhol. 87:27-45). Any line of ES cellscan be used, however, the line chosen is typically selected for theability of the cells to integrate into and become part of the germ lineof a developing embryo so as to create germ line transmission of theknockout construct. Thus, any ES cell line that is believed to have thiscapability is suitable for use herein. One mouse strain that istypically used for production of ES cells, is the 129J strain. AnotherES cell line is murine cell line D3 (American Type Culture Collection,catalog no. CKL 1934) Still another preferred ES cell line is the WW6cell line (Ioffe et al. (1995) PNAS 92:7357-7361). The cells arecultured and prepared for knockout construct insertion using methodswell known to the skilled artisan, such as those set forth by Robertsonin: Teratocarcinomas and Embryonic Stem Cells: A Practical Approach, E.J. Robertson, ed. IRL Press, Washington, D.C. [1987]); by Bradley et al.(1986) Current Topics in Devel. Biol. 20:357-371); and by Hogan et al.(Manipulating the Mouse Embryo: A Laboratory Manual, Cold Spring HarborLaboratory Press, Cold Spring Harbor, N.Y. [1986]).

[0169] A knock out construct refers to a uniquely configured fragment ofnucleic acid which is introduced into a stem cell line and allowed torecombine with the genome at the chromosomal locus of the gene ofinterest to be mutated. Thus a given knock out construct is specific fora given gene to be targeted for disruption. Nonetheless, many commonelements exist among these constructs and these elements are well knownin the art. A typical knock out construct contains nucleic acidfragments of not less than about 0.5 kb nor more than about 10.0 kb fromboth the 5′ and the 3′ ends of the genomic locus which encodes the geneto be mutated. These two fragments are separated by an interveningfragment of nucleic acid which encodes a positive selectable marker,such as the neomycin resistance gene (neo^(R)). The resulting nucleicacid fragment, consisting of a nucleic acid from the extreme 5′ end ofthe genomic locus linked to a nucleic acid encoding a positiveselectable marker which is in turn linked to a nucleic acid from theextreme 3′ end of the genomic locus of interest, omits most of thecoding sequence for target or other gene of interest to be knocked out.When the resulting construct recombines homologously with the chromosomeat this locus, it results in the loss of the omitted coding sequence,otherwise known as the structural gene, from the genomic locus. A stemcell in which such a rare homologous recombination event has taken placecan be selected for by virtue of the stable integration into the genomeof the nucleic acid of the gene encoding the positive selectable markerand subsequent selection for cells expressing this marker gene in thepresence of an appropriate drug (neomycin in this example).

[0170] Variations on this basic technique also exist and are well knownin the art. For example, a “knock-in” construct refers to the same basicarrangement of a nucleic acid encoding a 5′ genomic locus fragmentlinked to nucleic acid encoding a positive selectable marker which inturn is linked to a nucleic acid encoding a 3′ genomic locus fragment,but which differs in that none of the coding sequence is omitted andthus the 5′ and the 3′ genomic fragments used were initially contiguousbefore being disrupted by the introduction of the nucleic acid encodingthe positive selectable marker gene. This “knock-in” type of constructis thus very useful for the construction of mutant transgenic animalswhen only a limited region of the genomic locus of the gene to bemutated, such as a single exon, is available for cloning and geneticmanipulation. Alternatively, the “knock-in” construct can be used tospecifically eliminate a single functional domain of the targetted gene,resulting in a transgenic animal which expresses a polypeptide of thetargetted gene which is defective in one function, while retaining thefunction of other domains of the encoded polypeptide. This type of“knock-in” mutant frequently has the characteristic of a so-called“dominant negative” mutant because, especially in the case of proteinswhich homomultimerize, it can specifically block the action of (or“poison”) the polypeptide product of the wild-type gene from which itwas derived. In a variation of the knock-in technique, a marker gene isintegrated at the genomic locus of interest such that expression of themarker gene comes under the control of the transcriptional regulatoryelements of the targeted gene. A marker gene is one that encodes anenzyme whose activity can be detected (e.g., b-galactosidase), theenzyme substrate can be added to the cells under suitable conditions,and the enzymatic activity can be analyzed. One skilled in the art willbe familiar with other useful markers and the means for detecting theirpresence in a given cell. All such markers are contemplated as beingincluded within the scope of the teaching of this invention.

[0171] As mentioned above, the homologous recombination of the abovedescribed “knock out” and “knock in” constructs is very rare andfrequently such a construct inserts nonhomologously into a random regionof the genome where it has no effect on the gene which has been targetedfor deletion, and where it can potentially recombine so as to disruptanother gene which was otherwise not intended to be altered. Suchnonhomologous recombination events can be selected against by modifyingthe abovementioned knock out and knock in constructs so that they areflanked by negative selectable markers at either end (particularlythrough the use of two allelic variants of the thymidine kinase gene,the polypeptide product of which can be selected against in expressingcell lines in an appropriate tissue culture medium well known in theart—i.e. one containing a drug such as 5-bromodeoxyuridine). Thus apreferred embodiment of such a knock out or knock in construct of theinvention consist of a nucleic acid encoding a negative selectablemarker linked to a nucleic acid encoding a 5′ end of a genomic locuslinked to a nucleic acid of a positive selectable marker which in turnis linked to a nucleic acid encoding a 3′ end of the same genomic locuswhich in turn is linked to a second nucleic acid encoding a negativeselectable marker Nonhomologous recombination between the resultingknock out construct and the genome will usually result in the stableintegration of one or both of these negative selectable marker genes andhence cells which have undergone nonhomologous recombination can beselected against by growth in the appropriate selective media (e.g.media containing a drug such as 5-bromodeoxyuridine for example).Simultaneous selection for the positive selectable marker and againstthe negative selectable marker will result in a vast enrichment forclones in which the knock out construct has recombined homologously atthe locus of the gene intended to be mutated. The presence of thepredicted chromosomal alteration at the targeted gene locus in theresulting knock out stem cell line can be confirmed by means of Southernblot analytical techniques which are well known to those familiar in theart. Alternatively, PCR can be used.

[0172] Each knockout construct to be inserted into the cell must firstbe in the linear form. Therefore, if the knockout construct has beeninserted into a vector (described infra), linearization is accomplishedby digesting the DNA with a suitable restriction endonuclease selectedto cut only within the vector sequence and not within the knockoutconstruct sequence.

[0173] For insertion, the knockout construct is added to the ES cellsunder appropriate conditions for the insertion method chosen, as isknown to the skilled artisan. For example, if the ES cells are to beelectroporated, the ES cells and knockout construct DNA are exposed toan electric pulse using an electroporation machine and following themanufacturer's guidelines for use. After electroporation, the ES cellsare typically allowed to recover under suitable incubation conditions.The cells are then screened for the presence of the knock out constructas explained above. Where more than one construct is to be introducedinto the ES cell, each knockout construct can be introducedsimultaneously or one at a time.

[0174] After suitable ES cells containing the knockout construct in theproper location have been identified by the selection techniquesoutlined above, the cells can be inserted into an embryo. Insertion maybe accomplished in a variety of ways known to the skilled artisan,however a preferred method is by microinjection. For microinjection,about 10-30 cells are collected into a micropipet and injected intoembryos that are at the proper stage of development to permitintegration of the foreign ES cell containing the knockout constructinto the developing embryo. For instance, the transformed ES cells canbe microinjected into blastocytes. The suitable stage of development forthe embryo used for insertion of ES cells is very species dependent,however for mice it is about 3.5 days. The embryos are obtained byperfusing the uterus of pregnant females. Suitable methods foraccomplishing this are known to the skilled artisan, and are set forthby, e.g., Bradley et al. (supra).

[0175] While any embryo of the right stage of development is suitablefor use, preferred embryos are male. In mice, the preferred embryos alsohave genes coding for a coat color that is different from the coat colorencoded by the ES cell genes. In this way, the offspring can be screenedeasily for the presence of the knockout construct by looking for mosaiccoat color (indicating that the ES cell was incorporated into thedeveloping embryo). Thus, for example, if the ES cell line carries thegenes for white fur, the embryo selected will carry genes for black orbrown fur.

[0176] After the ES cell has been introduced into the embryo, the embryomay be implanted into the uterus of a pseudopregnant foster mother forgestation. While any foster mother may be used, the foster mother istypically selected for her ability to breed and reproduce well, and forher ability to care for the young. Such foster mothers are typicallyprepared by mating with vasectomized males of the same species. Thestage of the pseudopregnant foster mother is important for successfulimplantation, and it is species dependent. For mice, this stage is about2-3 days pseudopregnant.

[0177] Offspring that are born to the foster mother may be screenedinitially for mosaic coat color where the coat color selection strategy(as described above, and in the appended examples) has been employed. Inaddition, or as an alternative, DNA from tail tissue of the offspringmay be screened for the presence of the knockout construct usingSouthern blots and/or PCR as described above. Offspring that appear tobe mosaics may then be crossed to each other, if they are believed tocarry the knockout construct in their germ line, in order to generatehomozygous knockout animals. Homozygotes may be identified by Southernblotting of equivalent amounts of genomic DNA from mice that are theproduct of this cross, as well as mice that are known heterozygotes andwild type mice.

[0178] Other means of identifying and characterizing the knockoutoffspring are available. For example, Northern blots can be used toprobe the mRNA for the presence or absence of transcripts encodingeither the gene knocked out, the marker gene, or both. In addition,Western blots can be used to assess the level of expression of thetarget gene knocked out in various tissues of the offspring by probingthe Western blot with an antibody against the particular target protein,or an antibody against the marker gene product, where this gene isexpressed. Finally, in situ analysis (such as fixing the cells andlabeling with antibody) and/or FACS (fluorescence activated cellsorting) analysis of various cells from the offspring can be conductedusing suitable antibodies to look for the presence or absence of theknockout construct gene product.

[0179] Yet other methods of making knock-out or disruption transgenicanimals are also generally known. See, for example, Manipulating theMouse Embryo, (Cold Spring Harbor Laboratory Press, Cold Spring Harbor,N.Y., 1986). Recombinase dependent knockouts can also be generated, e.g.by homologous recombination to insert target sequences, such that tissuespecific and/or temporal control of inactivation of a Target-gene can becontrolled by recombinase sequences (described infra).

[0180] Animals containing more than one knockout construct and/or morethan one transgene expression construct are prepared in any of severalways. The preferred manner of preparation is to generate a series ofmammals, each containing one of the desired transgenic phenotypes. Suchanimals are bred together through a series of crosses, backcrosses andselections, to ultimately generate a single animal containing alldesired knockout constructs and/or expression constructs, where theanimal is otherwise congenic (genetically identical) to the wild typeexcept for the presence of the knockout construct(s) and/ortransgene(s).

[0181] A targetted transgene can encode the wild-type form of theprotein, or can encode homologs thereof, including both agonists andantagonists, as well as antisense constructs. In preferred embodiments,the expression of the transgene is restricted to specific subsets ofcells, tissues or developmental stages utilizing, for example,cis-acting sequences that control expression in the desired pattern. Inthe present invention, such mosaic expression of a target protein can beessential for many forms of lineage analysis and can additionallyprovide a means to assess the effects of, for example, lack of targetgene expression which might grossly alter development in small patchesof tissue within an otherwise normal embryo. Toward this and,tissue-specific regulatory sequences and conditional regulatorysequences can be used to control expression of the transgene in certainspatial patterns. Moreover, temporal patterns of expression can beprovided by, for example, conditional recombination systems orprokaryotic transcriptional regulatory sequences.

[0182] Genetic techniques, which allow for the expression of transgenescan be regulated via site-specific genetic manipulation in vivo, areknown to those skilled in the art. For instance, genetic systems areavailable which allow for the regulated expression of a recombinase thatcatalyzes the genetic recombination of a target sequence. As usedherein, the phrase “target sequence” refers to a nucleotide sequencethat is genetically recombined by a recombinase. The target sequence isflanked by recombinase recognition sequences and is generally eitherexcised or inverted in cells expressing recombinase activity.Recombinase catalyzed recombination events can be designed such thatrecombination of the target sequence results in either the activation orrepression of expression of one of the subject target proteins. Forexample, excision of a target sequence which interferes with theexpression of a recombinant target gene, such as one which encodes anantagonistic homolog or an antisense transcript, can be designed toactivate expression of that gene. This interference with expression ofthe protein can result from a variety of mechanisms, such as spatialseparation of the target gene from the promoter element or an internalstop codon. Moreover, the transgene can be made wherein the codingsequence of the gene is flanked by recombinase recognition sequences andis initially transfected into cells in a 3′ to 5′ orientation withrespect to the promoter element. In such an instance, inversion of thetarget sequence will reorient the subject gene by placing the 5′ end ofthe coding sequence in an orientation with respect to the promoterelement which allow for promoter driven transcriptional activation.

[0183] The transgenic animals of the present invention all includewithin a plurality of their cells a transgene of the present invention,which transgene alters the phenotype of the “host cell” with respect toregulation of cell growth, death and/or differentiation. Since it ispossible to produce transgenic organisms of the invention utilizing oneor more of the transgene constructs described herein, a generaldescription will be given of the production of transgenic organisms byreferring generally to exogenous genetic material. This generaldescription can be adapted by those skilled in the art in order toincorporate specific transgene sequences into organisms utilizing themethods and materials described below.

[0184] In an illustrative embodiment, either the cre/loxP recombinasesystem of bacteriophage P1 (Lakso et al. (1992) PNAS 89:6232-6236; Orbanet al. (1992) PNAS 89:6861-6865) or the FLP recombinase system ofSaccharomyces cerevisiae (O'Gorman et al. (1991) Science 251:1351-1355;PCT publication WO 92/15694) can be used to generate in vivosite-specific genetic recombination systems. Cre recombinase catalyzesthe site-specific recombination of an intervening target sequencelocated between loxP sequences. loxP sequences are 34 base pairnucleotide repeat sequences to which the Cre recombinase binds and arerequired for Cre recombinase mediated genetic recombination. Theorientation of loxP sequences determines whether the intervening targetsequence is excised or inverted when Cre recombinase is present(Abremski et al. (1984) J. Biol. Chem. 259:1509-1514); catalyzing theexcision of the target sequence when the loxP sequences are oriented asdirect repeats and catalyzes inversion of the target sequence when loxPsequences are oriented as inverted repeats.

[0185] Accordingly, genetic recombination of the target sequence isdependent on expression of the Cre recombinase. Expression of therecombinase can be regulated by promoter elements which are subject toregulatory control, e.g., tissue-specific, developmental stage-specific,inducible or repressible by externally added agents. This regulatedcontrol will result in genetic recombination of the target sequence onlyin cells where recombinase expression is mediated by the promoterelement. Thus, the activation expression of a recombinant target proteincan be regulated via control of recombinase expression.

[0186] Use of the cre/loxP recombinase system to regulate expression ofa recombinant target protein requires the construction of a transgenicanimal containing transgenes encoding both the Cre recombinase and thesubject protein. Animals containing both the Cre recombinase and arecombinant target gene can be provided through the construction of“double” transgenic animals. A convenient method for providing suchanimals is to mate two transgenic animals each containing a transgene,e.g., a target gene and recombinase gene.

[0187] One advantage derived from initially constructing transgenicanimals containing a target transgene in a recombinase-mediatedexpressible format derives from the likelihood that the subject protein,whether agonistic or antagonistic, can be deleterious upon expression inthe transgenic animal. In such an instance, a founder population, inwhich the subject transgene is silent in all tissues, can be propagatedand maintained. Individuals of this founder population can be crossedwith animals expressing the recombinase in, for example, one or moretissues and/or a desired temporal pattern. Thus, the creation of afounder population in which, for example, an antagonistic targettransgene is silent will allow the study of progeny from that founder inwhich disruption of target mediated induction in a particular tissue orat certain developmental stages would result in, for example, a lethalphenotype.

[0188] Similar conditional transgenes can be provided using prokaryoticpromoter sequences which require prokaryotic proteins to be simultaneousexpressed in order to facilitate expression of the target transgene.Exemplary promoters and the corresponding trans-activating prokaryoticproteins are given in U.S. Pat. No. 4,833,080.

[0189] Moreover, expression of the conditional transgenes can be inducedby gene therapy-like methods wherein a gene encoding thetrans-activating protein, e.g. a recombinase or a prokaryotic protein,is delivered to the tissue and caused to be expressed, such as in acell-type specific manner. By this method, a target gene:inteintransgene could remain silent into adulthood until “turned on” by theintroduction of the trans-activator.

[0190] In an exemplary embodiment, the “transgenic non-human animals” ofthe invention are produced by introducing transgenes into the germlineof the non-human animal. Embryonal target cells at various developmentalstages can be used to introduce transgenes. Different methods are useddepending on the stage of development of the embryonal target cell. Thespecific line(s) of any animal used to practice this invention areselected for general good health, good embryo yields, good pronuclearvisibility in the embryo, and good reproductive fitness. In addition,the haplotype is a significant factor. For example, when transgenic miceare to be produced, strains such as C57BL/6 or FVB lines are often used(Jackson Laboratory, Bar Harbor, Me.). Preferred strains are those withH-2b, H-2d or H-2q haplotypes such as C57BL/6 or DBA/1. The line(s) usedto practice this invention may themselves be transgenics, and/or may beknockouts (i.e., obtained from animals which have one or more genespartially or completely suppressed)

[0191] In one embodiment, the transgene construct is introduced into asingle stage embryo. The zygote is the best target for micro-injection.In the mouse, the male pronucleus reaches the size of approximately 20micrometers in diameter which allows reproducible injection of 1-2 pl ofDNA solution. The use of zygotes as a target for gene transfer has amajor advantage in that in most cases the injected DNA will beincorporated into the host gene before the first cleavage (Brinster etal. (1985) PNAS 82:4438-4442). As a consequence, all cells of thetransgenic animal will carry the incorporated transgene. This will ingeneral also be reflected in the efficient transmission of the transgeneto offspring of the founder since 50% of the germ cells will harbor thetransgene.

[0192] Normally, fertilized embryos are incubated in suitable mediauntil the pronuclei appear. At about this time, the nucleotide sequencecomprising the transgene is introduced into the female or malepronucleus as described below. In some species such as mice, the malepronucleus is preferred. It is most preferred that the exogenous geneticmaterial be added to the male DNA complement of the zygote prior to itsbeing processed by the ovum nucleus or the zygote female pronucleus. Itis thought that the ovum nucleus or female pronucleus release moleculeswhich affect the male DNA complement, perhaps by replacing theprotamines of the male DNA with histones, thereby facilitating thecombination of the female and male DNA complements to form the diploidzygote.

[0193] Thus, it is preferred that the exogenous genetic material beadded to the male complement of DNA or any other complement of DNA priorto its being affected by the female pronucleus. For example, theexogenous genetic material is added to the early male pronucleus, assoon as possible after the formation of the male pronucleus, which iswhen the male and female pronuclei are well separated and both arelocated close to the cell membrane. Alternatively, the exogenous geneticmaterial could be added to the nucleus of the sperm after it has beeninduced to undergo decondensation. Sperm containing the exogenousgenetic material can then be added to the ovum or the decondensed spermcould be added to the ovum with the transgene constructs being added assoon as possible thereafter.

[0194] Introduction of the transgene nucleotide sequence into the embryomay be accomplished by any means known in the art such as, for example,microinjection, electroporation, or lipofection. Following introductionof the transgene nucleotide sequence into the embryo, the embryo may beincubated in vitro for varying amounts of time, or reimplanted into thesurrogate host, or both. In vitro incubation to maturity is within thescope of this invention. One common method in to incubate the embryos invitro for about 1-7 days, depending on the species, and then reimplantthem into the surrogate host.

[0195] For the purposes of this invention a zygote is essentially theformation of a diploid cell which is capable of developing into acomplete organism. Generally, the zygote will be comprised of an eggcontaining a nucleus formed, either naturally or artificially, by thefusion of two haploid nuclei from a gamete or gametes. Thus, the gametenuclei must be ones which are naturally compatible, i.e., ones whichresult in a viable zygote capable of undergoing differentiation anddeveloping into a functioning organism. Generally, a euploid zygote ispreferred. If an aneuploid zygote is obtained, then the number ofchromosomes should not vary by more than one with respect to the euploidnumber of the organism from which either gamete originated.

[0196] In addition to similar biological considerations, physical onesalso govern the amount (e.g., volume) of exogenous genetic materialwhich can be added to the nucleus of the zygote or to the geneticmaterial which forms a part of the zygote nucleus. If no geneticmaterial is removed, then the amount of exogenous genetic material whichcan be added is limited by the amount which will be absorbed withoutbeing physically disruptive. Generally, the volume of exogenous geneticmaterial inserted will not exceed about 10 picoliters. The physicaleffects of addition must not be so great as to physically destroy theviability of the zygote. The biological limit of the number and varietyof DNA sequences will vary depending upon the particular zygote andfunctions of the exogenous genetic material and will be readily apparentto one skilled in the art, because the genetic material, including theexogenous genetic material, of the resulting zygote must be biologicallycapable of initiating and maintaining the differentiation anddevelopment of the zygote into a functional organism.

[0197] The number of copies of the transgene constructs which are addedto the zygote is dependent upon the total amount of exogenous geneticmaterial added and will be the amount which enables the genetictransformation to occur. Theoretically only one copy is required;however, generally, numerous copies are utilized, for example,1,000-20,000 copies of the transgene construct, in order to insure thatone copy is functional. As regards the present invention, there willoften be an advantage to having more than one functioning copy of eachof the inserted exogenous DNA sequences to enhance the phenotypicexpression of the exogenous DNA sequences.

[0198] Any technique which allows for the addition of the exogenousgenetic material into nucleic genetic material can be utilized so longas it is not destructive to the cell, nuclear membrane or other existingcellular or genetic structures. The exogenous genetic material ispreferentially inserted into the nucleic genetic material bymicroinjection. Microinjection of cells and cellular structures is knownand is used in the art.

[0199] Reimplantation is accomplished using standard methods. Usually,the surrogate host is anesthetized, and the embryos are inserted intothe oviduct. The number of embryos implanted into a particular host willvary by species, but will usually be comparable to the number of offspring the species naturally produces.

[0200] Transgenic offspring of the surrogate host may be screened forthe presence and/or expression of the transgene by any suitable method.Screening is often accomplished by Southern blot or Northern blotanalysis, using a probe that is complementary to at least a portion ofthe transgene. Western blot analysis using an antibody against theprotein encoded by the transgene may be employed as an alternative oradditional method for screening for the presence of the transgeneproduct. Typically, DNA is prepared from tail tissue and analyzed bySouthern analysis or PCR for the transgene. Alternatively, the tissuesor cells believed to express the transgene at the highest levels aretested for the presence and expression of the transgene using Southernanalysis or PCR, although any tissues or cell types may be used for thisanalysis.

[0201] Alternative or additional methods for evaluating the presence ofthe transgene include, without limitation, suitable biochemical assayssuch as enzyme and/or immunological assays, histological stains forparticular marker or enzyme activities, flow cytometric analysis, andthe like. Analysis of the blood may also be useful to detect thepresence of the transgene product in the blood, as well as to evaluatethe effect of the transgene on the levels of various types of bloodcells and other blood constituents.

[0202] Progeny of the transgenic animals may be obtained by mating thetransgenic animal with a suitable partner, or by in vitro fertilizationof eggs and/or sperm obtained from the transgenic animal. Where matingwith a partner is to be performed, the partner may or may not betransgenic and/or a knockout; where it is transgenic, it may contain thesame or a different transgene, or both. Alternatively, the partner maybe a parental line. Where in vitro fertilization is used, the fertilizedembryo may be implanted into a surrogate host or incubated in vitro, orboth. Using either method, the progeny may be evaluated for the presenceof the transgene using methods described above, or other appropriatemethods.

[0203] The transgenic animals produced in accordance with the presentinvention will include exogenous genetic material. As set out above, theexogenous genetic material will, in certain embodiments, be a DNAsequence which results in the production of a target protein (eitheragonistic or antagonistic), and antisense transcript, or a targetmutant. Further, in such embodiments the sequence will be attached to atranscriptional control element, e.g., a promoter, which preferablyallows the expression of the transgene product in a specific type ofcell.

[0204] Retroviral infection can also be used to introduce transgene intoa nonhuman animal. The developing non-human embryo can be cultured invitro to the blastocyst stage. During this time, the blastomeres can betargets for retroviral infection (Jaenich, R. (1976) PNAS 73:1260-1264).Efficient infection of the blastomeres is obtained by enzymatictreatment to remove the zona pellucida (Manipulating the Mouse Embryo,Hogan eds. (Cold Spring Harbor Laboratory Press, Cold Spring Harbor,1986). The viral vector system used to introduce the transgene istypically a replication-defective retrovirus carrying the transgene(Jahner et al. (1985) PNAS 82:6927-6931; Van der Putten et al. (1985)PNAS 82:6148-6152). Transfection is easily and efficiently obtained byculturing the blastomeres on a monolayer of virus-producing cells (Vander Putten, supra; Stewart et al. (1987) EMBO J 6:383-388).Alternatively, infection can be performed at a later stage. Virus orvirus-producing cells can be injected into the blastocoele (Jahner etal. (1982) Nature 298:623-628). Most of the founders will be mosaic forthe transgene since incorporation occurs only in a subset of the cellswhich formed the transgenic non-human animal. Further, the founder maycontain various retroviral insertions of the transgene at differentpositions in the genome which generally will segregate in the offspring.In addition, it is also possible to introduce transgenes into the germline by intrauterine retroviral infection of the midgestation embryo(Jahner et al. (1982) supra).

[0205] A third type of target cell for transgene introduction is theembryonal stem cell (ES). ES cells are obtained from pre-implantationembryos cultured in vitro and fused with embryos (Evans et al. (1981)Nature 292:154-156; Bradley et al. (1984) Nature 309:255-258; Gossler etal. (1986) PNAS 83: 9065-9069; and Robertson et al. (1986) Nature322:445-448). Transgenes can be efficiently introduced into the ES cellsby DNA transfection or by retrovirus-mediated transduction. Suchtransformed ES cells can thereafter be combined with blastocysts from anon-human animal. The ES cells thereafter colonize the embryo andcontribute to the germ line of the resulting chimeric animal. For reviewsee Jaenisch, R. (1988) Science 240:1468-1474.

4.10. Screening Assays for Intein Signaling Agents

[0206] An intein signaling agent can be any type of compound, includinga protein, a peptide, peptidomimetic, small molecule, and nucleic acid.A nucleic acid can be, e.g., a gene, an antisense nucleic acid, aribozyme, or a triplex molecule. An intein signaling agent of theinvention can be an agonist or an antagonist. Preferred intein agonistsinclude intein-interacting proteins or derivatives thereof which affectan intein self-excision activity.

[0207] The invention also provides screening methods for identifyingintein signaling agents which are capable of binding to an inteinprotein, e.g., a wild-type intein protein or a mutated form of an inteinprotein, and thereby modulate the self-excision activity of an intein orotherwise prevent the removal of the intein. For example, such an inteinmodulating agent can be an antibody or derivative thereof whichinteracts specifically with a wild-type intein protein and therebyantagonizes its self-excision activity. An intein modulating agent mayalso be a small molecule agonist which binds to a conditional mutantintein polypeptide and thereby activates the conditional mutant by, forexample, stabilizing an active form of the conditional inteinpolypeptide. Thus, the invention provides screening methods foridentifying intein agonist and antagonist compounds, comprisingselecting compounds which are capable of interacting with an inteinprotein or with a molecule capable of interacting with an inteinprotein. In general, a molecule which is capable of interacting with anintein protein is referred to herein as “intein binding partner”.

[0208] The compounds of the invention can be identified using variousassays depending on the type of compound and activity of the compoundthat is desired. In addition, as described herein, the test compoundscan be further tested in animal models. Set forth below are at leastsome assays that can be used for identifying intein modulating agents.It is within the skill of the art to design additional assays foridentifying intein modulating agents.

4.11. Cell-Free Assays

[0209] Cell-free assays can be used to identify compounds which arecapable of interacting with an intein protein or binding partner, tothereby modify the activity of the intein protein or binding partner.Such a compound can, e.g., modify the structure of an intein protein orbinding partner and thereby affect its activity. Cell-free assays canalso be used to identify compounds which modulate the interactionbetween an intein protein and an intein binding partner, such as atarget peptide. In a preferred embodiment, cell-free assays foridentifying such compounds consist essentially in a reaction mixturecontaining an intein protein and a test compound or a library of testcompounds in the presence or absence of a binding partner. A testcompound can be, e.g., a derivative of an intein binding partner, e.g.,a biologically inactive target peptide, or a small molecule.

[0210] Accordingly, one exemplary screening assay of the presentinvention includes the steps of contacting an intein protein orfunctional fragment thereof or an intein binding partner with a testcompound or library of test compounds and detecting the formation ofcomplexes. For detection purposes, the molecule can be labeled with aspecific marker and the test compound or library of test compoundslabeled with a different marker. Interaction of a test compound with anintein protein or fragment thereof or intein binding partner can then bedetected by determining the level of the two labels after an incubationstep and a washing step. The presence of two labels after the washingstep is indicative of an interaction.

[0211] An interaction between molecules can also be identified by usingreal-time BIA (Biomolecular Interaction Analysis, Pharmacia BiosensorAB) which detects surface plasmon resonance (SPR), an opticalphenomenon. Detection depends on changes in the mass concentration ofmacromolecules at the biospecific interface, and does not require anylabeling of interactants. In one embodiment, a library of test compoundscan be immobilized on a sensor surface, e.g., which forms one wall of amicro-flow cell. A solution containing the intein protein, functionalfragment thereof, intein analog or intein binding partner is then flowncontinuously over the sensor surface. A change in the resonance angle asshown on a signal recording, indicates that an interaction has occurred.This technique is further described, e.g., in BIAtechnology Handbook byPharmacia.

[0212] Another exemplary screening assay of the present inventionincludes the steps of (a) forming a reaction mixture including: (i) anintein polypeptide, (ii) an intein binding partner, and (iii) a testcompound; and (b) detecting interaction of the intein and the inteinbinding protein. The intein polypeptide and intein binding partner canbe produced recombinantly, purified from a source, e.g., plasma, orchemically synthesized, as described herein. A statistically significantchange (potentiation or inhibition) in the interaction of the intein andintein binding protein in the presence of the test compound, relative tothe interaction in the absence of the test compound, indicates apotential agonist (mimetic or potentiator) or antagonist (inhibitor) ofintein self-excision bioactivity for the test compound. The compounds ofthis assay can be contacted simultaneously. Alternatively, an inteinprotein can first be contacted with a test compound for an appropriateamount of time, following which the intein binding partner is added tothe reaction mixture. The efficacy of the compound can be assessed bygenerating dose response curves from data obtained using variousconcentrations of the test compound. Moreover, a control assay can alsobe performed to provide a baseline for comparison. In the control assay,isolated and purified intein polypeptide or binding partner is added toa composition containing the intein binding partner or inteinpolypeptide, and the formation of a complex is quantitated in theabsence of the test compound.

[0213] Complex formation between an intein protein and an intein bindingpartner may be detected by a variety of techniques. Modulation of theformation of complexes can be quantitated using, for example, detectablylabeled proteins such as radiolabeled, fluorescently labeled, orenzymatically labeled intein proteins or intein binding partners, byimmunoassay, or by chromatographic detection.

[0214] Typically, it will be desirable to immobilize either the inteinor its binding partner to facilitate separation of complexes fromuncomplexed forms of one or both of the proteins, as well as toaccommodate automation of the assay. Binding of an intein to an inteinbinding partner, can be accomplished in any vessel suitable forcontaining the reactants. Examples include microtitre plates, testtubes, and micro-centrifuge tubes. In one embodiment, a fusion proteincan be provided which adds a domain that allows the protein to be boundto a matrix. For example, glutathione-S-transferase/intein (GST/intein)fusion proteins can be adsorbed onto glutathione sepharose beads (SigmaChemical, St. Louis, Mo.) or glutathione derivatized microtitre plates,which are then combined with the intein binding partner, e.g. an35S-labeled intein binding partner, and the test compound, and themixture incubated under conditions conducive to complex formation, e.g.at physiological conditions for salt and pH, though slightly morestringent conditions may be desired. Following incubation, the beads arewashed to remove any unbound label, and the matrix immobilized andradiolabel determined directly (e.g. beads placed in scintilant), or inthe supernatant after the complexes are subsequently dissociated.Alternatively, the complexes can be dissociated from the matrix,separated by SDS-PAGE, and the level of intein protein or intein bindingpartner found in the bead fraction quantitated from the gel usingstandard electrophoretic techniques.

[0215] Other techniques for immobilizing proteins on matrices are alsoavailable for use in the subject assay. For instance, either the inteinor its cognate binding partner can be immobilized utilizing conjugationof biotin and streptavidin. For instance, biotinylated intein moleculescan be prepared from biotin-NHS (N-hydroxy-succinimide) using techniqueswell known in the art (e.g., biotinylation kit, Pierce Chemicals,Rockford, Ill.), and immobilized in the wells of streptavidin-coated 96well plates (Pierce Chemical). Alternatively, antibodies reactive withan intein can be derivatized to the wells of the plate, and inteintrapped in the wells by antibody conjugation. As above, preparations ofan intein binding protein and a test compound are incubated in theintein presenting wells of the plate, and the amount of complex trappedin the well can be quantitated. Exemplary methods for detecting suchcomplexes, in addition to those described above for the GST-immobilizedcomplexes, include immunodetection of complexes using antibodiesreactive with the intein binding partner, or which are reactive withintein protein and compete with the binding partner; as well asenzyme-linked assays which rely on detecting an enzymatic activityassociated with the binding partner, either intrinsic or extrinsicactivity. In the instance of the latter, the enzyme can be chemicallyconjugated or provided as a fusion protein with the intein bindingpartner. To illustrate, the intein binding partner can be chemicallycross-linked or genetically fused with horseradish peroxidase, and theamount of polypeptide trapped in the complex can be assessed with achromogenic substrate of the enzyme, e.g. 3,3′-diamino-benzadineterahydrochloride or 4-chloro-1-napthol. Likewise, a fusion proteincomprising the polypeptide and glutathione-S-transferase can beprovided, and complex formation quantitated by detecting the GSTactivity using 1-chloro-2,4-dinitrobenzene (Habig et al (1974) J BiolChem 249:7130).

[0216] For processes which rely on immunodetection for quantitating oneof the proteins trapped in the complex, antibodies against the protein,such as anti-intein antibodies, can be used. Alternatively, the proteinto be detected in the complex can be “epitope tagged” in the form of afusion protein which includes, in addition to the intein sequence, asecond polypeptide for which antibodies are readily available (e.g. fromcommercial sources). For instance, the GST fusion proteins describedabove can also be used for quantification of binding using antibodiesagainst the GST moiety. Other useful epitope tags include myc-epitopes(e.g., see Ellison et al. (1991) J Biol Chem 266:21150-21157) whichincludes a 10-residue sequence from c-myc, as well as the pFLAG system(International Biotechnologies, Inc.) or the pEZZ-protein A system(Pharmacia, N.J.).

[0217] Cell-free assays can also be used to identify compounds whichinteract with an intein protein and modulate an activity of an inteinprotein. Accordingly, in one embodiment, an intein protein is contactedwith a test compound and the catalytic activity of intein is monitored.In one embodiment, the abililty of the intein to bind a target moleculeis determined. The binding affinity of the intein to a target moleculecan be determined according to methods known in the art.

4.12. Cell Based Assays

[0218] The invention further provides certain cell-based assays for theidentification of intein modulating agents which agonize or antagonizethe self-excision activity of a wild type or conditional mutant intein.In one embodiment, the effect of a test compound on the expression of anintein-containing gene is determined by transfection experiments using areporter gene comprising a conveniently assayed marker into which hasbeen inserted the subject intein polypeptide sequence. The reporter genecan be any gene encoding a protein which is readily quantifiable, e.g,the luciferase or CAT gene. Such reporter gene are well known in theart. The test compound is contacted with the reporter gene expressingcell line and the amount of reporter (e.g. CAT) activity produced in thepresence of a test compound is compared to the amount of activityproduced in the absence of the test compound.

[0219] In preferred embodiments, the cell-based assays of the presentinvention make of use of the genetic complementation of a particularbiological phenotype by the target:intein polypeptide for the purpose ofidentifying intein self-excision agonist and antagonist compounds. Forexample, the complementation of a yeast gal4 mutant phenotype,characterized by an inability to grow on a media containing galactose asthe sole carbon source, by a GAL4:intein hybrid protein is dependentupon intein self-excision from the hybrid protein. Screening for inteinself-excision agonist and antagonist compounds may thus be effected bycontacting the gal4 GAL4:intein yeast strain with a test compound andmeasuring a galactose growth characteristic in the presence and in theabsence of the compound. Suitable galactose growth characteristicsinclude colony size and doubling time on galactose media. An inteinself-excision to may be used to identifyt agonist and antagonists whichaffect this galactose growth phenotype.

[0220] Another generally-applicable cell based assays useful for theidentification of intein self-excision agonists and antagonists is theyeast two-hybrid assay (Gyuris et al. (1993) Cell 75: 791-803) which isreadily adaptable to isolating natural (e.g from a cDNA expressionlibrary) or synthetic (detected from a library of random open readingframes) polypeptides which interact with an intein polypeptide of theinvention. This intein polypeptide/intein polypeptide binding partnerinteraction can be further adapted to screens which increase or decreasethis intein polypeptide/intein polypeptide binding partner interaction,thereby allowing detection of intein self-excision agonists andantagonists.

5. EXAMPLES Example 1 Isolating Conditional Intein Mutants in Yeast

[0221] In this example, a Saccharomyces-derived intein was inserted intoa derivative of the yeast GAL4 transcriptional activator and theresulting construct was used to obtain cold sensitive and temperaturesensitive conditional intein alleles. Thus, a specific polypeptidebioactivity (i.e. GAL 1, 10 transcriptional activation) can becontrolled by a signal (such as exposure to low temperature or hightemperature) which affects the auto-excision activity of an inactivatingintein inserted into the polypeptide encoding that bioactivity.

[0222] First, the full length GAL4 coding region was amplified from theplasmid pGaTB (Brand and Perrimon, (1993) Development 118: 401-15) byPCR so as to include a Drosophila translation initiation consensus ATGand a Myc epitope tag at the C terminal end (last 10 amino acids). Thisproduct was then subcloned into the pS5DH yeast vector using BamHI andAsp718 at the 5′ and 3′ ends respectively. pS5DH is a centromeric, URA3+yeast/E. Coli shuttle vector (Gietz and Sugino (1988) Gene 74: 527-34)modified to contain the strong constitutive Adh promoter (Susan Smithunpublished) which has been further modified to remove a HindIII withinthe polylinker. The resulting construct was then transformed into aURA3- and GAL4-deleted strain of yeast called FY760. Ura+ colonies couldgrow on galactose containing media whereas Ura+ cells transformed withjust the empty vector did not. These manipulations created a yeastAdh:GAL4* centromeric expression vector capable of supporting growth onmedia in which galactose is the sole carbon source.

[0223] This Adh:GAL4* construct was then modified so that the sequencefrom position 54 to 65 was AAA AAG CTT AAG. This added a unique HindIIIsite (AAGCTT) and destroyed an existing AflII site. In addition a newsilent AflII site was added into Gal4 (position 1461 to 1466 in thefinal sequence). This modified Gal4 construct was tested once more forits ability to rescue FY760 for growth on media in which galactose isthe sole carbon source and is known as pS5-Gal4.

[0224] Next, the INTEIN within the S. cerevisiae VMA1 gene was amplifiedby PCR from genomic yeast DNA, and was subsequently subcloned into pBS(Stratagene) and sequenced. An internal HindIII restriction site withinthe INTEIN was destroyed by PCR based in vitro mutagenesis. Thisconstruct was then amplified by PCR primers that included the Gal4sequence AAG CTT AAA at the 5′ end and the Gal4 derived sequence TCC AAAGAA AAA CCG AAG TGC CCA AGT GTC TTA AG at the 3′ end. With the HindIIIand AflII restriction sites added to the end of the INTEIN sequence thisproduct was subcloned into the modified pS5-Gal4 gapped with HindIII andAflII. The resulting pS5-Gal4INT construct was also tested for itsability to rescue FY760 and found to enable growth as efficiently aspS5-Gal4 lacking the INTEIN. Thus, these procedures resulted in theproduction of a yeast centromeric expression vector capable ofexpressing a GAL4*::INTEIN hybrid protein which could functionallycomplement a gal4 mutation.

[0225] An alternative approach to inserting the INTEIN nucleic acidsequence into the target polypeptide-encoding sequence is to performthis operation in vivo in yeast In this alternative method the INTEINwould be PCR amplified by long primers that include at least about 60 bpof sequence homologous to the target region within Gal4 on either sideof the desired INTEIN integration site. This PCR product is thenco-transformed into FY760 yeast together with the pS5-Gal4 plasmid whichhas been linearized by a restriction site situated close to the desiredinsertion site. As linear plasmids do not replicate in yeast, onlymolecules in which homologous recombination between the plasmid and thetwo ends of the PCR fragment has taken place will result in acircularized, viable plasmid containing the INTEIN.

[0226] Finally, temperature sensitive and cold sensitive derivatives ofthis GAL4*::INTEIN hybrid protein-producing vector were isolated. TheINTEIN sequence within pS5-Gal4INT was used as a template for mutageniclow fidelity PCR using primers just outside the unique HindIII and AflIIsites. The resulting product was trimmed and subcloned into gappedpS5-Gal4. The resulting ligation was transformed into ultra-competent E.coli cells and grown up in liquid culture as an amplification step. DNAextracted from this culture was used to transform FY760 yeast beforeplating onto URA-selective dextrose plates. The colonies that grew onthese plates were then replica plated onto two URA-selective galactoseplates which were grown at 18 and 30

C. Colonies that grew at different rates on these two plates wereidentified and re-tested for temperature sensitivity and the plasmidsthey contained were recovered. These plasmids were then re-transformedinto FY760 to ensure that the TS phenotype was plasmid related, theINTEIN within the pS5-Gal4INT molecules was sequenced.

Example 2 Use of TS Conditional Intein Mutants to Control Other Proteins

[0227] In order to confirm that the INTEIN TS alleles already generatedin a Gal4 context are autonomously TS (ie. host context independent) wehave moved the two alleles (TS1 and TS18) into Gal80 (a negativeregulator of Gal4). The resulting Gal80INT constructs are thenconstitutively expressed in wild type yeast and growth on a galactosecarbon source is assessed. If functional Gal80 is produced, endogenousGal4 is down regulated and no growth results. If the presence of theINTEIN in Gal80 disrupts the protein function then endogenous Gal4 isnot affected and cells will grow normally.

[0228] A total of 4 positions were analyzed (immediately upstream ofC127, S193, C277 and T299). Using the wild type (WT) INTEIN and a ‘dead’INTEIN previously shown not to splice (see Gal4 report above) weestablished that the VMA1 INTEIN must be positioned upstream of aCystine residue (ie. at C127 or C277). Other INTEINS have been describedas being present upstream of Serine and Threonine aminoacids hence theattempt to use these residues in this case.

[0229] The WT and dead intein controls acted as would be expected—i.e.the Gal80::INTEIN^(WT) construct was capable of repressing growth ongalactose while the Gal80::INTEIN^(DEAD) construct was not capable ofrepressing growth on galactose. Interestingly, when the conditionalintein alleles were inserted upstream of Gal80 C277, they conferreddifferent phenotypes upon the mutant gal80 protein, implying that theyestablished different levels of steady-state wild type spliced protein.The TS1 and TS18 mutant inteins, when inserted at C127 of Gal80, did notsignificantly interfere with growth on galactose, implying thatrelatively low levels of spliced Gal80 protein resulted. These twoalleles appear not to splice and growth is essentially the same as forthe Gal80INT-dead construct. In contrast, the two TS alleles, wheninserted at C227, inhibited growth on galactose at both the permissivetemperature (i.e. 18

C) and the restrictive temperature (i.e. 30

C), implying that relatively large amounts of spliced wild-type Gal80protein are produced even at the restrictive temperature. These resultssuggest that, depending upon the protein context into which theconditional intein is inserted, different levels of spliced versusunspliced protein can be achieved. These results will be confirmed bythe analysis of gross levels of spliced and unspliced Gal80 proteinusing an immunoprecipitation and Western blotting assay.

[0230] Therefore the invention is adaptable to the regulation of activeprotein concentrations at various levels depending upon the site ofinsertion into the target protein.

[0231] We are still further pursuing two other lines of investigation togenerate still other working examples. The first is to move the otheravailable TS alleles into the two C127 and C277 positions in an attemptto identify one of the alleles as being strictly autonomously TS for thegalactose growth phenotype when placed in the context of Gal80.

[0232] Another approach we are taking is to move the TS INTEINS togetherwith a small region of the context in which they were generated (inGal4). It has been shown that the INTEIN interacts with residues of thehost protein immediately up and downstream of its insertion site duringsplicing (see Nogami et al. (1997) Genetics 147:73). Therefore it ispossible that the galactose phenotype of the TS alleles tested in Gal80may be due to the temperature sensitive nature of the interactions ofthe INTEIN with these flanking amino acids. Thus the transfer of theseresidues together with the INTEIN may maintain the conditional nature ofthe system.

[0233] We will also insert the TS1 and TS18 INTEINS into GFP togetherwith a short region (2-4 amino acids) flanking the original insertions.By using the commercially available anti-GFP antibodies and PAGE/Westernblot analysis we will test to see if this then results in host protein“independent” splicing. Obviously this approach would result in a shortstretch of “foreign” amino-acids being left in the host protein but mayrepresent one approach with which the system could be optimized.

[0234] We further note here that if an autonomously acting TS alleles isidentified it may be possible to ‘improve’ its characteristics byfurther rounds of mutagenesis (as was accomplished, for example, in someof the screens for brighter GFP molecules).

[0235] Still further, we note that if the “flanking” ‘pieces arerequired to make a conditional system it may be possible to utilize thissequence for particular purposes. For example, these flanks will onlycome together after splicing and could potentially be used as a tag(given the production of suitable antibodies) with which to identifyfunctional (spliced) host protein. These tagged intein constructs couldbe utilized in screens to identify interacting compositions whichagonize or antagonize the intein splicing reaction.

Example 3 Use of Condition-Sensitive Mutants in Plants

[0236] Low temperature is a major environmental limitation to theproduction of agricultural crops. For example, late spring frosts delayseed germination, early fall frosts decrease the quality and yield ofharvests and winter low temperatures decrease the survival ofoverwintering crops, such as winter cereals and fruit trees. However,some plants have the ability to withstand prolonged subfreezingtemperatures. If proteins involved in the development of frost tolerancein these plants, as well as the corresponding genes, can be identified,it may be possible to transform frost sensitive crop plants into frosttolerant crop plants and extend the range of crop production.

[0237] Biological organisms can survive icy environments by inhibitinginternal ice formation. This strategy requires the synthesis ofantifreeze proteins (AFPs) or thermal hysteresis proteins (THPs). Fourdistinct types of (AFPs) have been identified in fish and a number ofdifferent THPs have been identified in insects. These previous findingssuggest that this adaptive mechanism has arisen independently indifferent organisms. Antifreeze proteins are thought to bind to icecrystals to prevent further growth of the crystals. The presence ofantifreeze proteins can be determined (1) by examining the shape of icecrystals as they form and (2) by measuring the existence of thermalhysteresis (the difference in temperature at which a particular solutionmelts and freezes).

[0238] It was generally understood that antifreeze proteins did notexist in plants. Instead, it was thought that some internal mechanism ofthe plant cells adapted them to withstand external ice crystal formationon their outer cell walls without damaging the cell. For example, aplant gene expressed at low temperature codes for a protein similar inamino acid sequence to the antifreeze protein, did not have sufficientamounts of the encoded protein to determine whether it exhibited anantifreeze activity in the plant and particularly within the plant cell.Fish antifreeze protein to can increase frost tolerance in plants.

[0239] Examples of plant anti-freeze include the Arachis hypogaea coldshock protein (AHCSP33), Dave et al. (1998) Phytochemistry 49:2207-13; acarrot leucine-rich-repeat-protein that inhibits ice re-crystallization,which is similar to the anti-freeze proteins found in fish and whichaccumulates antifreeze activity when expressed in transgenic tobaccoplants, (Worrall et al., (1998) Science 282:115-117); an arabidopsisthaliana cold induced kin1 gene, a alanine, glycine, and lysine-richprotein, which protein is also induced by osmotic stress (Kurkela et al.(1990) Plant Mol. Biol. 15:137-144); (Tahtiharju et al (1997) Planta203:442-447); antifreeze proteins in rye are reported as being similarto pathogenesis-related proteins such as endochitinases (Hon et al.Plant Physiol. 91995) 109(3):879-89. Furhermore other studies ofcold-inducibe genes in plants have suggested the existence of family ofcold-resistant polypeptides. A rapid and stable change occurs in thetranslatable poly(A).sup.+RNA populations extracted from leaves ofplants exposed to low temperatures. Total protein analysis of the planttissues was conducted to detect proteins which might be associated withfrost tolerance in plants. Proteins found in cold acclimated leafextracts having molecular weights of 110 kd, 82 kD, 66 kD, 55 kD and 13kD were not found in non-acclimated leaf extracts. It is thought thatthe increased expression of certain mRNAs may encode proteins that areinvolved directly in a development of increased freezing tolerance forthe plant. High molecular mass proteins which are believed to beassociated with cold acclimation in spinach. The total protein contentof the acclimated spinach leaf is assessed. Cold acclimated proteinshaving molecular weights of 110 kD, 90 kD and 79 kD were identified.However, their location and function within the cell remain unknown.

[0240] In certain instances cold tolerance has been conferred bytransgenic expression of for e.g., a synthetic anti-freeze protein inpotato plants (Wallis et al. (1997) Plant Mol. Biol., 35:323-330; or afusion of Staphylococcal protein A and antifreeeze protein (AFP) frompolar fish (Hightower et al. (1991) Plant Mol. Biol. 17:1013-1021).Further, certain studies have suggested that accumulation of antifreezeproteins is temperature or cold specific. For instance, constitutiveexpression of a fish antifreeze protein encoding gene does not lead tomeasureable antifreeze protein until the plant is exposed to colderconditions, suggesting that such AFP may be inherently unstable atwarmer temperatures (Kenward et al 91993) Plant Mol. Biol. 23:377-385).

[0241] Therefore in one embodiment, this invention contemplates theconstitutive expression of AFP wherein the activity of the AFPpolypeptide so expressed may be rapidly induced so as to conferimmediate cold tolerance and/or ice crystal growth inhibition in theabsence of de novo synthesis. It is known that AFP polypetides depressthe freezing temperature of a solution in a non-colligative manner(Chapski et al. 91997) FEBS Let. 412: 241-244). Therefore, the rapidinduction of an existing latent cold tolerance bioactivity would beexpected to confer superior resistance to sudden frost conditions thanmechanisms requiring de novo synthesis of the AFP polypeptides.

[0242] Accordingly, in one aspect, this invention contemplates,regulatable AFP proteins comprising condition-sensitive mutant intein,such as AFP proteins comprising mutant temperature sensitive inteins,such as temperature sensitive alleles of S. Cerviseaea vacuolar ATPasecatalytic subunit (VMA) intein containing gene. Examples of thesetemperature sensitive alleles of the Sce. VMA intein sequences are setforth in SEQ ID Nos. 2 to 9 The amino acid changes in the TS alleles dueto these specific mutations are listed in Table 3 above, wherein L212Prefers to a Leucine→Proline change at position 212.

[0243] In one example, a temperature sensitive allele is inserted intoan AFP gene from winter flounder which codes for an alanine-rich alphahelical type I AFP. Plants may be transformed with an expression vectorcomprising the AFP-intein hybrid. Transformation may be accomplished byany of the methods which have been well documented in the art.

[0244] In particular, various methods are known to one of ordinary skillin the art to accomplish such genetic transformation of plants and planttissues. For example, these methods include transformation byAgrobacterium species and transformation by direct gene transfer. Thesemethod are described in detail in U.S. Pat. No. 5,789,214, which isincorporated herein by reference.

[0245] The Agrobacterium system permits routine transformation of avariety of plant tissue, examples of such plants include tobacco,tomato, sunflower, cotton, rapeseed, potato, soybean, and poplar. Whilethe host range for Ti plasmid transformation using A. tumefaciens as theinfecting agent is known to be very large, tobacco has been a host ofchoice in laboratory experiments because of its ease of manipulation.Another example is Agrobacterium rhizogenes which has also been used asa vector for plant transformation. Transformation using A. rhizogeneshas been successfully utilized to transform, for example, alfalfa,Solanum nigrum L., and poplar.

[0246] In addition, the art also discloses many direct gene transferprocedures which have been developed to successfully transform plantstransform plants and plant tissues without the use of an Agrobacteriumintermediate (see, for example, Koziel et al., Biotechnology 11: 194-200(1993). For example, exogenous DNA can be introduced into cells orprotoplasts by microinjection. (Reich, T. J. et al., Bio/Technology 4:1001 (1986). Another example involves bombardment of cells bymicroprojectiles carrying DNA, see Klein, T. M. et al., Nature 327: 70(1987).

[0247] Accordingly, tobacco plants may be transformed using any of themethods described above, with an AFDP-intein gene consruct which isexpressed from the Cauliflower Mosaic virus 19S RNA promoter usingNopaline synthetase polyadenylation site. Expression of the AFP-inteinmay be confirmed by Western blot analysis. Accumulation of(non-functional) AFP was observed at warmer temperatures, and it wasobserved that a shift to colder temperatures results in the formation offunctional AFP and an excised autonomous intein.

Example 4 Inducibly Trans-Spliced Thymidine Kinase

[0248] In a second example, an intein trans-spliced regulatable form ofthymidine kinase is constructed and expressed under the control of apituitary hormone promoter (human GH or glycoprotein hormonealpha-subunit) using recombinant adenoviral vectors. Injection into nudemice carrying propagated GH3 cell pituitary adenomas results ingancyclovir-dependent cytotoxicity which is further dependent upon achemical signal (rapamycin) to trigger trans-splicing of the thymidinekinase exteins into a single mature thymidine kinase polypeptide. Theadded level of control provided by the rapamycin chemical signal affordsgreater flexibility in achieving optimal tumor cell cytotoxicity in atemporally regulatable manner. Further advantages include regulatingdrug toxicity and assuring cell specificity in the host organism.

[0249] First, in order to ensure that the insertion of the regulatablytrans-spliced intein disrupts the thymidine kinase bioactivity of thetarget polypeptide, a BLAST protein alignment with the target humanherpes simplex virus thymidine kinase polypeptide sequence is performed.Two representative matches with related viral thymidine kinase genesfrom other host species are shown below. This step assures that thetrans-spliced intervening protein sequence segments are appropriatelyinserted so as to interfere with the target protein's activity. Covalentseparation of two major segments of a target polypeptide and concomitantfusion of the end of these segments to intervening protein sequences isunlikely to fail to disrupt the target polypeptide's bioactivity.Nonetheless, this step ensures that the trans-spliced intein units arenot placed so as to disrupt an unconserved, nonessential amino- orcarboxy-terminal portion of the polypeptide. Furthermore, such ananalysis assures that the site of the disrupting trans-spliced inteindoes not correspond to an unconserved “linker” sequence, without whichthe amino and carboxy exteins might still reassemble by virtue ofinherent protein domain/protein domain affinities. Indeed in The BLASThomology searching program (NCBI's sequence similarity search tool) wasused to identify homologs of the Herpes Simplex Virus type 2 thymidinekinase (TK) polypeptide sequence (Swiss-Prot. Acc. No. 3915741) to beused in the experiment. Representative related viral TK polypeptidesequences are shown below. Comparison the human type 2 TK sequence(Query) to both a bovine HSV viral TK homolog (TK homolog 1, Subject)and a related pseudorabies viral TK homolog (TK homolog 2, Subject)reveals several candidate conserved serine (S), threonine (T) andcysteine (C) residues which are conserved in both evolutionarily distanthomologs. The cysteine at amino acid 172 of the human HSV TK polypeptideis chosen on the basis of: it's chemical suitability for intein excisionas an amino terminal end of a carboxy-extein; it's presence near thecenter of the polypeptide, flanked by regions of conserved sequence; andit's presence in a large block of strictly conserved sequence,contraindicative of a dispensable polypeptide loop domain.

[0250] Whereas for most polypeptides specific guidance for insertionsite selection will be easily obtained by comparison with other proteinswith the same bioactivity, in certain instances, such as the instantexample, additional guidance will be available in the form of proteincrystal structure studies (see e.g.http://www.ncbi.nlm.nih.gov/Structure/which provides access to a largebank of proteins for which crystal structures are available). TK homolog1 (from bovine HSV; Swiss-Prot. Acc. No. 125440) Query: 49LLRVYIDGPHGVGKTTTSAQLMEALGPRDNIVYVPEPMTYWQVLGASETLTNIYNTQHRL 108LLRVY+DGPHG+GKTT+++L  G ++Y+PEPM+YW G ++ +Y QHR+ Sbjct: 4LLRVYVDGPHGLGKTTAASRLASERG---DAIYLPEPMSYWSGAGEDDLVARVYTAQHRM 60 Query:109 DRGEISAGEAAVVMTSAQITMSTPYAATDAVLAPHIGGEAVGPQAPPPALTLVFDRHPIA 168DRGEI A EAA V+ AQ+TMSTPY A + ++A PP L L+FDRHP A Sbjct: 61DRGEIDAREAAGVVLGAQLTMSTPYVALNGLIAPHIGEEPSPGNATPPDLILIFDRHPTA 120 Query:169 SLLCYPAARYLMGSMTPQAVLAFVALMPPTAPGTNLVLGVLPEAEHADRLARRQRPGERL 228SLLCYP ARYL + ++VL+ +AL+PPT PGTNL+LG P +H RL R PGE Sbjct: 121SLLCYPLARYLTRCLPIESVLSLIALIPPTPPGTNLILGTAPAEDHLSRLVARGPPGELP 180 Query:229 DLAMLSAIRRVYDLLANTVRYLQRGGRWREDWGRLTGVAAATPRPDPEDGAGSLPRIEDT 288 DML AIR VY LLANTV+YLQ GG WR D G   P PEG +P  +T Sbjct: 181DARMLRAIRYVYALLANTVKYLQSGGSWRADLG---SEPPRLPLAPPEIGDPNNPGGHNT 237 Query:289 LALFRVPELLAPNGDLYHIFAWVLDVLADRLLPMHLF 325 LL +A G  ++W LD+LADRL M++FSbjct: 238 L-LALIHGAGATRG-CAAMTSWTLDLLADRLRSMNMF 272

[0251] TK homolog 2 (from Pseudorabies virus (STRAIN NIA-3); Swiss-Prot.Acc. No.125456) Query: 49LLRVYIDGPHGVGKTTTSAQLMEALGPRDNIVYVPEPMTYWQVLGASETLTNIYNTQHRL 108+LR+Y+DG+ GK+TT+ + ALG  +YVPEPM YW+L ++T+IY+ Q R Sbjct: 3ILRIYLDGAYDTGKSTTARVM--ALG---GALYVPEPMAYWRTLFDTDTVAGIYDAQTRK 57 Query:109 DRGEISAGEAAVVMTSAQITMSTPYAATDAVLAPHIGGEAVGPQAPPPALTLVFDRHPIA 168 G+S+AA+V  Q +TPY  LP G  GP  P+T+VFDRHP+A Sbjct: 58QNGSLSEEDAALVTAHDQAAFATPYLLLHTRLVPLFGPAVEGP----PEMTVVFDRHPVA 113 Query:169 SLLCYPAARYLMGSMTPQAVLAFVALMPPTAPGTNLVLGVLPEAEHADRLARRQRPGERL 228+ +C+P AR+++G++ A+A+P PG NLV+ L EH RL R R GE+ Sbjct: 114ATVCFPLARFIVGDISAAAFVGLAATLPGEPPGGNLVVASLDPDEHLRRLRARARAGEHV 173 Query:229 DLAMLSAIRRVYDLLANTVRYLQRGGRWREDWGRLTGVAAAT-----------PRPDPED 277 D+L+A+R VY +L NT RYL G RWR+DWGR   T   PR DPE Sbjct: 174DARLLTALRNVYAMLVNTSRYLSSGRRWRDDWGRAPRFDQTTRDCLALNELCRPRDDPE- 232 Query:278 GAGSLPRIEDTL-ALFRVPELLAPNGDLYHIFAWVLDVLADRLLPMHLFVLDYDQSPVGC 336++DTL ++ PEL  G  +AW +D L +LLP+ + +D SP C Sbjct: 233-------LQDTLFGAYKAPELCDRRGRPLEVHAWAMDALVAKLLPLRVSTVDLGPSPRVC 285 Query:337 RDALLRLTAGMIPTRVTTAGSIAEIRDLARTFAREVG 373 A+  TGM  VT+  IR  F E+GSbjct: 286 AAAVAAQTRGM---EVTESAYGDHIRQCVCAFTSEMG 319

[0252] Therefore an appropriate set of constructs for creating thetrans-spliced TK polypeptide would be: TK_(codons1-171)-INTEIN^(N) andINTEIN^(C)-TK_(codons172(cys)-376). These two polypeptides are modifiedfurther so as to subject them to regulated trans-transplicing asdescribed below.

[0253] As the instant application is in a mammalian system, thetemperature sensitive conditional intein mutants are not readilyexploitable. Instead, this example takes advantage of the observationthat trans-splicing of an Extein^(N)-Intein^(N) polypeptide to anIntein^(C)-Extein^(C) polypeptide can occur in vitro (Southworth et al.(1998) EMBO J 17: 918-26). The application of inducible trans-splicingto regulation of a hypothetical target polypeptide is diagramed in FIG.3. Formation of the intein splicing active site requires proper foldingof the intein to bring together the two splice junctions, which can beseparated by as much as 500 amino acids or more. The in vitro formationof the intein splicing active site was guided by Intein^(N)/Intein^(C)protein/protein interactions. In particular, the Intein^(N) andIntein^(C) sequences collectively comprised the entire Psp Pol-1Intein-encoded endonuclease which, when proteolytically cleaved into twopieces, is able to reassemble by virtue of “innate” protein/proteinaffinities (Southworth et al. (1998) EMBO J 17: 918-26). Followingnoncovalent in vitro association of the Extein^(N)-Intein^(N) andIntein^(C)-Extein^(C) polypeptides, activation of the inteinauto-excision function followed spontaneously to yield covalently joinedExtein^(N)-Extein^(C) product and a noncovalently joinedIntein^(N):Intein^(C) complex. This in vivo trans-splicing applicationis expected to function with relative efficiency—indeed certainprotein/protein reconstitution have been shown to occur more efficientlyin vivo than in vitro (Gross et al. (1996) Protein Sci 5: 320-30). Thustrans-splicing of intein amino and carboxy-terminal domains can occurspontaneously in vitro provided that the intein units are broughttogether by appropriate intermolecular attractions.

[0254] The instant example takes advantage of this observation by usinga recently developed chemical dimerizer system (Pruschy et al. (1994)Curr Biol 1: 163-72) to bring the Extein^(N)-Intein^(N) andIntein^(C)-Extein^(C) polypeptides together in a regulatable manner soas to potentiate trans-splicing of the extein units to yield anExtein^(N)-Extein^(C) product.

[0255] The chemical dimerizer utilized in this application is capable ofcrosslinking FKBP (FK506 binding protein) and FKBP Rapamycin AssociatedProtein (FRAP). FKBP12 belongs to a class of immunophilin proteins,originally discovered because of their high affinity forimmunosuppressive drugs. FKBP12 binds to the natural products FK506 andrapamycin with high affinity (K_(D)=0.4 nM and 0.2 nM respectively). Theprotein has intrinsic peptidyl-prolyl cis-trans isomerase activity,which is blocked on binding to either FK506 or rapamycin, but which doesnot appear to be related to the ability of these molecules to inhibitintracellular signaling pathways. Instead, their actions are mediated bythe formation of composite surfaces in the FKBP12-FK506 andFKBP12-rapamycin complexes that allow binding to calcineurin and thelipid kinase, FKBP-rapamycin-associated protein (FRAP) respectively.Inhibiting the function of calcineurin and FRAP results in theinhibition of different signaling pathways. Studies of FK506 reveal thatit possesses two protein-binding surfaces, an immunophilin-bindingsurface and a calcineurin-binding one; it can thus be termed a “chemicalinducer of dimerization” (CID). Two factors that are important in theselection of FK506 as a building block for a designed CIP is its abilityto cross cell membranes and its high affinity for FKBPs. To construct anFK506 dimer, two FK506 monomers can be dimerized via a functional groupwithin the calcineurin-binding domain. The resulting dimer still bindsto FKBP12, but the complex of the dimer with FKBP12 should not bind tocalcineurin and thus should not block TCR signaling. Furthermore,modified chemical dimerizers which bind only to genetically modifiedforms of FKBP binding proteins are also available and potentiallyeliminate concerns about undesirable immunosuppressive effects frombinding to endogenous FKBP (Clackson et al. (1998) PNAS 95: 10437-42).

[0256] In this example, the Extein^(N)-Intein^(N) polypeptide is fusedto FKBP and the Intein^(C)-Extein^(C) polypeptide is fused to FRAP. BothFKBP and FRAP are capable of binding simultaneously to rapamycin. Inpractice either rapamycin binding protein can be used with either aminoor carboxy-terminal target polypeptide. A homopolymeric “hinge” region(e.g. polyglycine—polyG) is also added between each target polypeptidefragment and its rapamycin binding protein domain. Such hinge regionsare predicted to lack secondary structure following protein folding. Asa result, the intein amino and carboxy terminal domains are expected tobe free to associate upon dimerization of the FKBP and FRAP domains withrapamycin. The resulting twopolypeptides-TK_(codons1-171)-Intein^(N)-polyG-FKBP andFRAP-polyG-Intein^(C)-TK_(codons172(cys)-376) can be stablyco-expressed. The thymidine kinase bioactivity can then be induced atany time by delivery of the dimerizer drug rapamycin which causes thenon-covalent association of the two protein halves to formTK_(codons1-171)Intein^(N)-polyG-FKBP:rapamycin:FRAP-polyG-Intein^(C)-TK_(codons172(cys)376).This complex undergoes intein trans-splicing via assocation of theIntein^(N) and Intein^(C) domains, to generate a TK₁₋₃₇₆ completethymidine kinase polypeptide product andIntein^(N)-polyG-FKBP:rapamycin:FRAP-polyG-Intein^(C) byproductpolypeptide.

[0257] The two trans-spliced polypeptide-encoding gene constructs can bedelivered to a target cell or tissue by a virus or other suitabledelivery system known in the art.

Equivalents

[0258] Those skilled in the art will recognize, or be able to ascertainusing no more than routine experimentation, many equivalents of thespecific embodiments of the invention described herein. Such equivalentsare intended to be encompassed by the following claims.

1 54 1 454 PRT Saccharomyces cerevisiae 1 Cys Phe Ala Lys Gly Thr AsnVal Leu Met Ala Asp Gly Ser Ile Glu 1 5 10 15 Cys Ile Glu Asn Ile GluVal Gly Asn Lys Val Met Gly Lys Asp Gly 20 25 30 Arg Pro Arg Glu Val IleLys Leu Pro Arg Gly Arg Glu Thr Met Tyr 35 40 45 Ser Val Val Gln Lys SerGln His Arg Ala His Lys Ser Asp Ser Ser 50 55 60 Arg Glu Val Pro Glu LeuLeu Lys Phe Thr Cys Asn Ala Thr His Glu 65 70 75 80 Leu Val Val Arg ThrPro Arg Ser Val Arg Arg Leu Ser Arg Thr Ile 85 90 95 Lys Gly Val Glu TyrPhe Glu Val Ile Thr Phe Glu Met Gly Gln Lys 100 105 110 Lys Ala Pro AspGly Arg Ile Val Glu Leu Val Lys Glu Val Ser Lys 115 120 125 Ser Tyr ProIle Ser Glu Gly Pro Glu Arg Ala Asn Glu Leu Val Glu 130 135 140 Ser TyrArg Lys Ala Ser Asn Lys Ala Tyr Phe Glu Trp Thr Ile Glu 145 150 155 160Ala Arg Asp Leu Ser Leu Leu Gly Ser His Val Arg Lys Ala Thr Tyr 165 170175 Gln Thr Tyr Ala Pro Ile Leu Tyr Glu Asn Asp His Phe Phe Asp Tyr 180185 190 Met Gln Lys Ser Lys Phe His Leu Thr Ile Glu Gly Pro Lys Val Leu195 200 205 Ala Tyr Leu Leu Gly Leu Trp Ile Gly Asp Gly Leu Ser Asp ArgAla 210 215 220 Thr Phe Ser Val Asp Ser Arg Asp Thr Ser Leu Met Glu ArgVal Thr 225 230 235 240 Glu Tyr Ala Glu Lys Leu Asn Leu Cys Ala Glu TyrLys Asp Arg Lys 245 250 255 Glu Pro Gln Val Ala Lys Thr Val Asn Leu TyrSer Lys Val Val Arg 260 265 270 Gly Asn Gly Ile Arg Asn Asn Leu Asn ThrGlu Asn Pro Leu Trp Asp 275 280 285 Ala Ile Val Gly Leu Gly Phe Leu LysAsp Gly Val Lys Asn Ile Pro 290 295 300 Ser Phe Leu Ser Thr Asp Asn IleGly Thr Arg Glu Thr Phe Leu Ala 305 310 315 320 Gly Leu Ile Asp Ser AspGly Tyr Val Thr Asp Glu His Gly Ile Lys 325 330 335 Ala Thr Ile Lys ThrIle His Thr Ser Val Arg Asp Gly Leu Val Ser 340 345 350 Leu Ala Arg SerLeu Gly Leu Val Val Ser Val Asn Ala Glu Pro Ala 355 360 365 Lys Val AspMet Asn Gly Thr Lys His Lys Ile Ser Tyr Ala Ile Tyr 370 375 380 Met SerGly Gly Asp Val Leu Leu Asn Val Leu Ser Lys Cys Ala Gly 385 390 395 400Ser Lys Lys Phe Arg Pro Ala Pro Ala Ala Ala Phe Ala Arg Glu Cys 405 410415 Arg Gly Phe Tyr Phe Glu Leu Gln Glu Leu Lys Glu Asp Asp Tyr Tyr 420425 430 Gly Ile Thr Leu Ser Asp Asp Ser Asp His Gln Phe Leu Leu Ala Asn435 440 445 Gln Val Val Val His Asn 450 2 454 PRT Artificial SequenceDescription of Artificial Sequence Synthetic VMA allele mutation 2 CysPhe Ala Lys Gly Thr Asn Val Leu Met Ala Asp Gly Ser Ile Glu 1 5 10 15Cys Ile Glu Asn Ile Glu Val Gly Asn Lys Val Met Gly Lys Asp Gly 20 25 30Arg Pro Arg Glu Val Ile Lys Leu Pro Arg Gly Arg Glu Thr Met Tyr 35 40 45Ser Val Val Gln Lys Ser Gln His Arg Ala His Lys Ser Asp Ser Ser 50 55 60Arg Glu Val Pro Glu Leu Leu Lys Phe Thr Cys Asn Ala Thr His Glu 65 70 7580 Leu Val Val Arg Thr Pro Arg Ser Val Arg Arg Leu Ser Arg Thr Ile 85 9095 Lys Gly Val Glu Tyr Phe Glu Val Ile Thr Phe Glu Met Gly Gln Lys 100105 110 Lys Ala Pro Asp Gly Arg Ile Val Glu Leu Val Lys Glu Val Ser Lys115 120 125 Ser Tyr Pro Ile Ser Glu Gly Pro Glu Arg Ala Asn Glu Leu ValGlu 130 135 140 Ser Tyr Arg Lys Ala Ser Asn Lys Ala Tyr Phe Glu Trp ThrIle Glu 145 150 155 160 Ala Arg Asp Leu Ser Leu Leu Gly Ser His Val ArgLys Ala Thr Tyr 165 170 175 Gln Thr Tyr Ala Pro Ile Leu Tyr Glu Asn AspHis Phe Phe Asp Tyr 180 185 190 Met Gln Lys Ser Lys Phe His Leu Thr IleGlu Gly Pro Lys Val Leu 195 200 205 Ala Tyr Leu Pro Gly Leu Trp Ile GlyAsp Gly Leu Ser Asp Arg Ala 210 215 220 Thr Phe Ser Val Asp Ser Arg AspThr Ser Leu Met Glu Arg Val Thr 225 230 235 240 Glu Tyr Ala Glu Lys LeuAsn Leu Cys Ala Glu Tyr Lys Asp Arg Lys 245 250 255 Glu Pro Gln Val AlaLys Thr Val Asn Leu Tyr Ser Lys Val Val Arg 260 265 270 Gly Asn Gly IleArg Asn Asn Leu Asn Thr Glu Asn Pro Leu Trp Asp 275 280 285 Ala Ile ValGly Leu Gly Phe Leu Lys Asp Gly Val Lys Asn Ile Pro 290 295 300 Ser PheLeu Ser Thr Asp Asn Ile Gly Thr Arg Glu Thr Phe Leu Ala 305 310 315 320Gly Leu Ile Asp Ser Asp Gly Tyr Val Thr Asp Glu His Gly Ile Lys 325 330335 Ala Thr Ile Lys Thr Ile His Thr Ser Val Arg Asp Gly Leu Val Ser 340345 350 Leu Ala Arg Ser Leu Gly Leu Val Val Ser Val Asn Ala Glu Pro Ala355 360 365 Lys Val Asp Met Asn Gly Thr Lys His Lys Ile Ser Tyr Ala IleTyr 370 375 380 Met Ser Gly Gly Asp Val Leu Leu Asn Val Leu Ser Lys CysAla Gly 385 390 395 400 Ser Lys Lys Phe Arg Pro Ala Pro Ala Ala Ala PheAla Arg Glu Cys 405 410 415 Arg Gly Phe Tyr Phe Glu Leu Gln Glu Leu LysGlu Asp Asp Tyr Tyr 420 425 430 Gly Ile Thr Leu Ser Asp Asp Ser Asp HisGln Phe Leu Leu Ala Asn 435 440 445 Gln Val Val Val His Asn 450 3 454PRT Artificial Sequence Description of Artificial Sequence Synthetic VMAallele mutation 3 Cys Phe Ala Lys Gly Thr Asn Val Leu Met Ala Asp GlySer Ile Glu 1 5 10 15 Cys Ile Glu Asn Ile Glu Val Gly Asn Lys Val MetGly Lys Asp Gly 20 25 30 Arg Pro Arg Glu Val Ile Lys Leu Pro Arg Gly ArgGlu Thr Met Tyr 35 40 45 Ser Val Val Gln Lys Ser Gln His Arg Ala His LysSer Asp Ser Ser 50 55 60 Arg Glu Val Pro Glu Leu Leu Lys Phe Thr Cys AsnAla Thr His Glu 65 70 75 80 Leu Val Val Arg Thr Pro Arg Ser Val Arg ArgLeu Ser Arg Thr Ile 85 90 95 Lys Gly Val Glu Tyr Phe Glu Val Ile Thr PheGlu Met Gly Gln Lys 100 105 110 Lys Ala Pro Asp Gly Arg Ile Val Glu LeuVal Lys Glu Val Ser Lys 115 120 125 Ser Tyr Pro Ile Ser Glu Gly Pro GluArg Ala Asn Glu Leu Val Glu 130 135 140 Ser Tyr Arg Lys Ala Ser Asn LysAla Tyr Phe Glu Trp Thr Ile Glu 145 150 155 160 Ala Arg Asp Leu Ser LeuLeu Gly Ser His Val Arg Lys Ala Thr Tyr 165 170 175 Gln Thr Tyr Ala ProIle Leu Tyr Glu Asn Asp His Phe Phe Asp Tyr 180 185 190 Met Gln Lys SerLys Phe His Leu Thr Ile Glu Gly Pro Lys Val Leu 195 200 205 Ala Tyr LeuLeu Gly Leu Trp Ile Gly Asp Gly Leu Ser Asp Arg Ala 210 215 220 Thr PheSer Val Asp Ser Arg Asp Thr Ser Leu Met Glu Arg Val Thr 225 230 235 240Glu Tyr Ala Glu Lys Leu Asn Leu Cys Ala Glu Tyr Lys Asp Arg Lys 245 250255 Glu Pro Gln Val Ala Lys Thr Val Asn Leu Tyr Ser Lys Val Val Arg 260265 270 Gly Asn Gly Ile Arg Thr Asn Leu Asn Thr Glu Asn Pro Leu Trp Asp275 280 285 Ala Ile Val Gly Leu Gly Phe Leu Lys Asp Gly Val Lys Asn IlePro 290 295 300 Ser Phe Leu Ser Thr Asp Asn Ile Gly Thr Arg Glu Thr PheLeu Ala 305 310 315 320 Gly Leu Ile Asp Ser Asp Gly Tyr Val Thr Asp GluHis Gly Ile Lys 325 330 335 Ala Thr Ile Lys Thr Ile His Thr Ser Val ArgAsp Gly Leu Val Ser 340 345 350 Leu Ala Arg Ser Leu Gly Leu Val Val SerVal Asn Ala Glu Pro Ala 355 360 365 Lys Val Asp Met Asn Gly Thr Lys HisLys Ile Ser Tyr Ala Ile Tyr 370 375 380 Met Ser Gly Gly Asp Val Ser LeuAsn Val Leu Ser Lys Cys Ala Gly 385 390 395 400 Ser Lys Lys Phe Arg ProAla Pro Ala Ala Ala Phe Ala Arg Glu Cys 405 410 415 Arg Gly Phe Tyr PheGlu Leu Gln Glu Leu Lys Glu Asp Asp Tyr Tyr 420 425 430 Gly Ile Thr LeuSer Asp Asp Ser Asp His Gln Phe Leu Leu Ala Asn 435 440 445 Gln Val ValVal His Asn 450 4 454 PRT Artificial Sequence Description of ArtificialSequence Synthetic VMA allele mutation 4 Cys Phe Ala Lys Gly Thr Asn ValLeu Met Ala Asp Gly Ser Ile Glu 1 5 10 15 Cys Ile Glu Asn Ile Glu ValGly Asn Lys Val Met Gly Lys Asp Gly 20 25 30 Arg Pro Arg Glu Val Ile LysLeu Pro Arg Gly Arg Glu Thr Met Tyr 35 40 45 Ser Val Val Gln Lys Ser GlnHis Arg Ala His Lys Ser Asp Ser Ser 50 55 60 Arg Glu Val Pro Glu Leu LeuLys Phe Thr Cys Asn Ala Thr His Glu 65 70 75 80 Leu Val Val Arg Thr ProArg Ser Val Arg Arg Leu Ser Arg Thr Ile 85 90 95 Lys Gly Val Glu Tyr PheGlu Val Ile Thr Phe Glu Met Gly Gln Lys 100 105 110 Lys Ala Pro Asp GlyArg Ile Val Glu Phe Val Lys Glu Val Ser Lys 115 120 125 Ser Tyr Pro IleSer Glu Gly Pro Glu Arg Ala Asn Glu Leu Val Glu 130 135 140 Ser Tyr ArgLys Ala Ser Asn Lys Ala Tyr Phe Glu Trp Thr Ile Glu 145 150 155 160 AlaArg Asp Leu Ser Pro Leu Gly Ser His Val Arg Lys Ala Thr Tyr 165 170 175Gln Thr Tyr Ala Pro Ile Leu Tyr Glu Asn Asp His Phe Phe Asp Tyr 180 185190 Met Gln Lys Ser Lys Phe His Leu Thr Ile Glu Gly Pro Lys Val Leu 195200 205 Ala Tyr Leu Leu Gly Leu Trp Ile Gly Asp Gly Leu Ser Asp Arg Ala210 215 220 Thr Phe Ser Val Asp Ser Arg Asp Thr Ser Leu Met Glu Arg ValThr 225 230 235 240 Glu Tyr Ala Glu Lys Leu Asn Leu Cys Ala Glu Tyr LysAsp Arg Lys 245 250 255 Glu Pro Arg Val Ala Lys Thr Val Asn Leu Tyr SerLys Val Val Arg 260 265 270 Gly Asn Gly Ile Arg Asn Asn Leu Asn Thr GluAsn Pro Leu Trp Asp 275 280 285 Ala Ile Val Gly Leu Gly Phe Leu Lys AspGly Val Lys Asn Ile Pro 290 295 300 Ser Phe Leu Ser Thr Asp Asn Ile GlyThr Arg Glu Thr Phe Leu Ala 305 310 315 320 Gly Leu Ile Asp Ser Asp GlyTyr Val Thr Asp Glu His Gly Ile Lys 325 330 335 Ala Thr Ile Lys Thr IleHis Thr Ser Val Arg Asp Gly Leu Val Ser 340 345 350 Leu Ala Arg Ser LeuGly Leu Val Val Ser Val Asn Ala Glu Pro Ala 355 360 365 Lys Val Asp MetAsn Gly Thr Lys His Lys Ile Ser Tyr Ala Ile Tyr 370 375 380 Met Ser GlyGly Asp Val Leu Leu Asn Val Leu Ser Lys Cys Ala Gly 385 390 395 400 SerLys Lys Phe Arg Pro Ala Pro Ala Ala Ala Phe Ala Arg Glu Cys 405 410 415Arg Gly Phe Tyr Phe Glu Leu Gln Glu Leu Lys Glu Asp Asp Tyr Tyr 420 425430 Gly Ile Thr Leu Ser Asp Asp Ser Asp His Gln Phe Leu Leu Ala Asn 435440 445 Gln Val Val Val His Asn 450 5 454 PRT Artificial SequenceDescription of Artificial Sequence Synthetic VMA allele mutation 5 CysPhe Ala Lys Gly Thr Asn Val Leu Met Ala Asp Gly Ser Ile Glu 1 5 10 15Cys Ile Glu Asn Ile Glu Val Gly Asn Lys Val Met Gly Lys Asp Gly 20 25 30Arg Pro Arg Glu Val Ile Lys Leu Pro Arg Gly Arg Glu Thr Met Tyr 35 40 45Ser Val Val Gln Lys Ser Gln His Arg Ala His Lys Ser Asp Ser Ser 50 55 60Arg Glu Val Pro Glu Leu Leu Lys Phe Thr Cys Asn Ala Thr His Glu 65 70 7580 Leu Val Val Arg Thr Pro Arg Ser Val Arg Arg Leu Ser Arg Thr Ile 85 9095 Lys Gly Val Glu Tyr Phe Glu Val Ile Thr Phe Glu Met Gly Gln Lys 100105 110 Lys Ala Pro Asp Gly Arg Ile Val Glu Leu Val Lys Glu Val Ser Lys115 120 125 Ser Tyr Pro Ile Ser Glu Gly Pro Glu Arg Ala Asn Glu Leu ValGlu 130 135 140 Ser Tyr Arg Lys Ala Ser Asn Lys Ala Tyr Phe Glu Trp ThrIle Glu 145 150 155 160 Ala Arg Asp Leu Ser Leu Leu Gly Ser His Val ArgLys Ala Thr Tyr 165 170 175 Gln Thr Tyr Ala Pro Ile Leu Tyr Glu Asn AspHis Phe Phe Asp Tyr 180 185 190 Met Gln Lys Ser Lys Phe His Leu Thr IleGlu Gly Pro Lys Val Leu 195 200 205 Ala Tyr Leu Leu Gly Leu Trp Ile GlyAsp Gly Leu Ser Asp Arg Ala 210 215 220 Thr Phe Ser Val Asp Ser Arg AspThr Ser Leu Met Glu Arg Val Thr 225 230 235 240 Glu Tyr Ala Glu Lys LeuAsn Leu Cys Ala Glu Tyr Lys Asp Arg Lys 245 250 255 Glu Pro Gln Val AlaLys Thr Val Asn Leu Tyr Ser Lys Val Val Arg 260 265 270 Gly Asn Gly IleArg Asn Asn Leu Asn Thr Glu Asn Pro Leu Trp Asp 275 280 285 Ala Ile ValGly Leu Gly Phe Leu Lys Asp Gly Val Lys Asn Ile Pro 290 295 300 Ser PheLeu Ser Thr Asp Asn Ile Gly Thr Arg Glu Thr Phe Leu Ala 305 310 315 320Gly Leu Ile Gly Ser Asp Gly Tyr Val Thr Asp Glu His Gly Ile Lys 325 330335 Ala Thr Ile Lys Thr Ile His Thr Ser Val Arg Asp Gly Leu Val Ser 340345 350 Leu Ala Arg Ser Leu Gly Leu Val Val Ser Val Asn Ala Glu Pro Ala355 360 365 Lys Val Asp Met Asn Gly Thr Lys His Lys Ile Ser Tyr Ala IleTyr 370 375 380 Met Ser Gly Gly Asp Val Leu Leu Asn Val Leu Ser Lys CysAla Gly 385 390 395 400 Ser Lys Lys Phe Arg Pro Ala Pro Ala Ala Ala PheAla Arg Glu Cys 405 410 415 Arg Gly Phe Tyr Phe Glu Leu Gln Glu Leu LysGlu Asp Asp Tyr Tyr 420 425 430 Gly Ile Thr Leu Ser Asp Asp Ser Asp HisGln Phe Leu Leu Ala Asn 435 440 445 Gln Val Val Val His Asn 450 6 454PRT Artificial Sequence Description of Artificial Sequence Synthetic VMAallele mutation 6 Cys Phe Ala Lys Gly Thr Asn Val Leu Met Ala Asp GlySer Ile Glu 1 5 10 15 Cys Ile Glu Asn Ile Glu Val Gly Asn Lys Val MetGly Lys Asp Gly 20 25 30 Arg Pro Arg Glu Val Ile Lys Leu Pro Arg Gly ArgGlu Thr Met Tyr 35 40 45 Ser Val Val Gln Lys Ser Gln His Arg Ala His LysSer Asp Ser Ser 50 55 60 Arg Glu Val Pro Glu Leu Leu Lys Phe Thr Cys AsnAla Thr His Glu 65 70 75 80 Leu Val Val Arg Thr Pro Arg Ser Val Arg ArgLeu Ser Arg Thr Ile 85 90 95 Lys Gly Val Glu Tyr Phe Glu Val Ile Thr PheGlu Met Gly Gln Lys 100 105 110 Lys Ala Pro Asp Gly Arg Ile Val Glu LeuVal Lys Glu Val Ser Lys 115 120 125 Ser Tyr Pro Ile Ser Glu Gly Pro GluArg Ala Asn Glu Leu Val Glu 130 135 140 Ser Tyr Arg Lys Ala Pro Asn LysAla Tyr Leu Glu Trp Thr Ile Glu 145 150 155 160 Ala Arg Asp Leu Ser LeuLeu Gly Ser His Val Arg Lys Ala Thr Tyr 165 170 175 Gln Thr Tyr Ala ProIle Leu Tyr Glu Asn Asp His Phe Phe Asp Tyr 180 185 190 Met Gln Lys SerLys Phe His Leu Thr Ile Glu Gly Pro Lys Val Leu 195 200 205 Ala Tyr LeuLeu Gly Leu Trp Ile Gly Asp Gly Leu Ser Asp Arg Ala 210 215 220 Thr PheSer Val Asp Ser Arg Asp Ala Ser Leu Met Glu Arg Val Thr 225 230 235 240Glu Tyr Ala Glu Lys Leu Ser Leu Cys Ala Glu Tyr Lys Asp Arg Lys 245 250255 Glu Pro Gln Val Ala Lys Thr Val Asn Leu Tyr Ser Lys Val Val Arg 260265 270 Gly Asn Gly Ile Arg Asn Asn Leu Asn Thr Glu Asp Pro Leu Trp Asp275 280 285 Ala Ile Val Gly Leu Gly Phe Leu Lys Asp Gly Val Lys Asn IlePro 290 295 300 Ser Phe Leu Ser Thr Asp Asn Ile Gly Thr Arg Glu Thr PheLeu Ala 305 310 315 320 Gly Leu Ile Asp Ser Asp Gly Tyr Val Thr Asp GluHis Gly Ile Lys 325 330 335 Ala Thr Ile Lys Thr Ile His Thr Ser Val ArgAsp Gly Leu Val Ser 340 345 350 Leu Ala Arg Ser Leu Gly Leu Val Val SerVal Asn Ala Glu Pro Ala 355 360 365 Lys Val Asp Met Asn Gly Thr Lys HisLys Ile Ser Tyr Ala Ile Tyr 370 375 380 Met Ser Gly Gly Asp Val Leu LeuAsn Val Leu Ser Lys Cys Ala Gly 385 390 395 400 Ser Lys Lys Phe Arg ProAla Pro Ala Ala Ala Phe Ala Arg Glu Cys 405 410 415 Arg Gly Phe Tyr PheGlu Leu Gln Glu Leu Lys Glu Asp Asp Tyr Tyr 420 425 430 Gly Ile Thr LeuSer Asp Asp Ser Asp His Gln Phe Leu Leu Ala Asn 435 440 445 Gln Ala ValVal His Asn 450 7 454 PRT Artificial Sequence Description of ArtificialSequence Synthetic VMA allele mutation 7 Cys Phe Ala Lys Gly Thr Asn ValLeu Met Ala Asp Gly Ser Ile Glu 1 5 10 15 Cys Ile Glu Asn Ile Lys ValGly Asn Lys Val Met Gly Lys Asp Gly 20 25 30 Arg Pro Arg Glu Val Ile LysLeu Pro Arg Gly Arg Glu Thr Val Tyr 35 40 45 Ser Val Val Gln Lys Ser GlnHis Arg Ala His Lys Ser Asp Ser Ser 50 55 60 Arg Glu Val Pro Glu Leu LeuLys Phe Thr Cys Asn Ala Thr His Glu 65 70 75 80 Leu Val Val Arg Thr ProArg Ser Val Arg Arg Leu Ser Arg Thr Ile 85 90 95 Lys Gly Val Glu Tyr LeuGlu Val Ile Thr Phe Glu Met Gly Gln Lys 100 105 110 Lys Ala Pro Asp GlyArg Ile Val Glu Leu Val Lys Glu Val Ser Lys 115 120 125 Ser Tyr Pro IleSer Glu Gly Pro Glu Arg Ala Asn Glu Leu Val Glu 130 135 140 Ser Tyr ArgLys Ala Ser Asn Lys Ala Tyr Phe Glu Trp Thr Ile Glu 145 150 155 160 AlaArg Asp Leu Ser Leu Ser Gly Ser His Val Arg Lys Ala Thr Tyr 165 170 175Gln Thr Tyr Ala Pro Ile Leu Tyr Glu Asn Asp His Phe Phe Asp Tyr 180 185190 Met Gln Lys Ser Lys Phe His Leu Thr Ile Glu Gly Pro Lys Val Leu 195200 205 Ala Tyr Leu Leu Gly Leu Trp Ile Gly Asp Gly Leu Ser Asp Arg Ala210 215 220 Thr Phe Ser Val Asp Ser Arg Asp Thr Ser Leu Met Glu Arg ValThr 225 230 235 240 Glu Tyr Ala Glu Lys Leu Asn Leu Cys Ala Glu Tyr LysAsp Arg Lys 245 250 255 Glu Pro Gln Val Ala Lys Thr Val Asn Leu Tyr SerLys Val Val Arg 260 265 270 Gly Asn Gly Ile Arg Asn Asn Leu Asn Thr GluAsn Pro Leu Trp Asp 275 280 285 Ala Ile Val Gly Leu Gly Phe Leu Lys AspGly Val Lys Asn Ile Pro 290 295 300 Ser Phe Leu Ser Thr Asp Asn Ile GlyThr Arg Glu Thr Phe Leu Ala 305 310 315 320 Gly Leu Ile Asp Ser Asp GlyTyr Val Thr Asp Glu His Gly Ile Lys 325 330 335 Ala Thr Ile Lys Thr IleHis Thr Ser Val Arg Asp Gly Leu Val Ser 340 345 350 Leu Ala Arg Ser LeuGly Leu Val Val Ser Val Asn Ala Glu Pro Ala 355 360 365 Lys Val Asp MetAsn Gly Thr Lys His Lys Ile Ser Tyr Ala Ile Tyr 370 375 380 Met Ser GlyGly Asp Val Leu Leu Asn Val Leu Ser Lys Cys Ala Gly 385 390 395 400 SerLys Lys Phe Arg Pro Ala Pro Ala Ala Ala Phe Ala Arg Glu Cys 405 410 415Arg Gly Phe Tyr Phe Glu Leu Gln Glu Leu Lys Glu Asp Asp Tyr Tyr 420 425430 Gly Ile Thr Leu Ser Asp Asp Ser Asp His Gln Phe Leu Leu Ala Asn 435440 445 Gln Val Val Val His Asn 450 8 454 PRT Artificial SequenceDescription of Artificial Sequence Synthetic VMA allele mutation 8 CysPhe Ala Lys Gly Thr Asn Val Leu Met Ala Asp Gly Ser Ile Glu 1 5 10 15Cys Ile Glu Asn Ile Glu Val Gly Asn Lys Val Met Gly Lys Gly Gly 20 25 30Arg Pro Arg Gly Val Ile Lys Leu Pro Arg Gly Arg Glu Thr Met Tyr 35 40 45Ser Val Val Gln Lys Ser Gln His Arg Ala His Lys Ser Asp Pro Ser 50 55 60Arg Glu Val Pro Glu Leu Leu Lys Phe Thr Cys Asn Ala Thr His Glu 65 70 7580 Leu Val Val Arg Thr Pro Arg Ser Val Arg Arg Leu Ser Arg Thr Ile 85 9095 Lys Gly Val Glu Tyr Phe Glu Val Ile Thr Phe Glu Met Gly Gln Lys 100105 110 Lys Ala Pro Asp Gly Arg Ile Val Glu Leu Val Lys Glu Val Ser Lys115 120 125 Ser Tyr Pro Ile Ser Glu Gly Pro Gly Arg Ala Asn Glu Leu ValGlu 130 135 140 Ser Tyr Arg Lys Ala Ser Asn Lys Ala Cys Phe Glu Trp ThrIle Glu 145 150 155 160 Ala Arg Asp Leu Ser Leu Leu Gly Ser His Val ArgLys Ala Thr Tyr 165 170 175 Gln Thr Tyr Ala Pro Ile Leu Tyr Glu Asn AspHis Phe Phe Asp Tyr 180 185 190 Met Gln Lys Ser Lys Phe His Leu Thr IleGlu Gly Pro Lys Val Leu 195 200 205 Ala Tyr Leu Leu Gly Leu Trp Ile GlyAsp Gly Leu Ser Asp Arg Ala 210 215 220 Thr Phe Ser Val Asp Ser Arg AspThr Ser Leu Met Glu Arg Val Thr 225 230 235 240 Glu Tyr Ala Glu Lys LeuAsn Leu Cys Ala Glu Tyr Lys Asp Arg Lys 245 250 255 Glu Pro Gln Val AlaLys Thr Val Asn Leu Tyr Ser Lys Val Val Arg 260 265 270 Gly Asn Gly IleArg Asn Asn Leu Ser Thr Glu Asn Pro Leu Trp Asp 275 280 285 Ala Ile ValGly Leu Gly Phe Leu Lys Asp Gly Val Lys Asn Ile Pro 290 295 300 Ser PheLeu Ser Thr Asp Asn Ile Gly Thr Arg Glu Thr Phe Leu Ala 305 310 315 320Gly Leu Ile Asp Ser Asp Gly Tyr Val Thr Asp Glu His Gly Ile Lys 325 330335 Ala Thr Ile Lys Thr Ile His Thr Ser Val Arg Asp Gly Leu Val Ser 340345 350 Leu Ala Arg Ser Leu Gly Leu Val Val Ser Val Asn Ala Glu Pro Ala355 360 365 Lys Val Asp Met Asn Gly Thr Lys His Lys Ile Ser Tyr Ala IleTyr 370 375 380 Met Ser Gly Gly Asp Val Leu Leu Asn Val Leu Ser Lys CysAla Gly 385 390 395 400 Ser Lys Lys Phe Arg Pro Ala Pro Ala Ala Ala PheAla Arg Glu Cys 405 410 415 Arg Gly Phe Tyr Phe Glu Leu Gln Glu Leu LysGlu Asp Asp Tyr Tyr 420 425 430 Gly Ile Thr Leu Ser Asp Asp Ser Asp HisGln Phe Leu Leu Ala Asn 435 440 445 Gln Val Val Val His Asn 450 9 454PRT Artificial Sequence Description of Artificial Sequence Synthetic VMAallele mutation 9 Cys Phe Ala Lys Gly Thr Asn Val Leu Met Ala Asp GlySer Ile Glu 1 5 10 15 Cys Ile Glu Asn Ile Glu Val Gly Asn Lys Val MetGly Lys Asp Gly 20 25 30 Arg Pro Arg Glu Val Ile Lys Leu Pro Arg Gly ArgGlu Thr Met Tyr 35 40 45 Ser Val Val Gln Lys Ser Gln His Arg Ala His LysSer Asp Ser Ser 50 55 60 Arg Glu Val Pro Glu Leu Leu Lys Phe Thr Cys AsnAla Thr His Glu 65 70 75 80 Leu Val Val Arg Thr Pro Arg Ser Val Arg ArgLeu Ser Arg Thr Ile 85 90 95 Lys Gly Val Glu Tyr Phe Lys Val Ile Thr PheGlu Met Gly Gln Lys 100 105 110 Lys Ala Pro Asp Gly Arg Ile Val Glu LeuVal Lys Glu Val Ser Lys 115 120 125 Ser Tyr Pro Ile Ser Glu Gly Pro GluArg Ala Asn Glu Leu Val Glu 130 135 140 Ser Tyr Arg Lys Ala Ser Asn LysAla Tyr Phe Glu Trp Thr Ile Glu 145 150 155 160 Ala Arg Asp Leu Ser LeuLeu Gly Ser His Val Arg Lys Ala Thr Tyr 165 170 175 Gln Thr Tyr Ala ProIle Leu Tyr Glu Asn Asp His Phe Phe Asp Tyr 180 185 190 Met Gln Lys SerLys Phe His Leu Thr Ile Glu Gly Pro Lys Val Leu 195 200 205 Ala Tyr LeuLeu Gly Leu Trp Ile Gly Asp Gly Leu Ser Asp Arg Ala 210 215 220 Thr PheSer Val Asp Ser Arg Asp Thr Ser Leu Met Glu Arg Val Thr 225 230 235 240Glu Tyr Ala Glu Lys Leu Asn Leu Cys Ala Glu Tyr Lys Asp Arg Lys 245 250255 Glu Pro Gln Val Ala Lys Thr Val Asn Leu Tyr Ser Lys Val Val Arg 260265 270 Gly Asn Gly Ile Arg Asn Asn Leu Asn Thr Glu Asn Pro Leu Trp Asp275 280 285 Ala Ile Val Gly Leu Gly Phe Leu Lys Asp Gly Val Lys Asn IlePro 290 295 300 Ser Phe Leu Ser Thr Asp Asn Ile Gly Thr Arg Glu Thr PheLeu Ala 305 310 315 320 Gly Leu Ile Asp Ser Asp Gly Tyr Val Thr Asp GluHis Gly Ile Lys 325 330 335 Ala Thr Ile Lys Thr Ile His Thr Ser Val ArgAsp Gly Leu Val Ser 340 345 350 Leu Ala Arg Phe Leu Gly Leu Val Val SerVal Asn Ala Glu Pro Ala 355 360 365 Lys Val Asp Met Asn Gly Thr Lys HisLys Ile Ser Tyr Ala Ile Tyr 370 375 380 Met Ser Gly Gly Asp Val Leu LeuAsn Val Leu Ser Lys Cys Ala Gly 385 390 395 400 Ser Lys Lys Phe Arg ProAla Pro Ala Ala Ala Phe Ala Arg Glu Cys 405 410 415 Arg Gly Phe Tyr PheGlu Leu Gln Glu Leu Lys Glu Asp Asp Tyr Tyr 420 425 430 Gly Ile Thr LeuSer Asp Asp Ser Asp His Gln Phe Leu Leu Ala Asn 435 440 445 Gln Val ValVal His Asn 450 10 454 PRT Artificial Sequence Description of ArtificialSequence Synthetic VMA allele mutation 10 Cys Phe Ala Lys Gly Thr AsnVal Leu Met Ala Asp Gly Ser Ile Glu 1 5 10 15 Cys Ile Glu Asn Ile GluVal Gly Asn Lys Val Met Gly Lys Asp Gly 20 25 30 Arg Pro Arg Glu Val IleLys Leu Pro Arg Gly Arg Glu Thr Met Tyr 35 40 45 Ser Val Val Gln Lys SerGln His Arg Ala His Lys Ser Asp Ser Ser 50 55 60 Arg Glu Val Pro Glu LeuLeu Lys Phe Thr Cys Asn Ala Thr His Glu 65 70 75 80 Leu Val Val Arg ThrPro Arg Ser Val Arg Arg Leu Ser Arg Thr Ile 85 90 95 Lys Gly Val Glu TyrPhe Glu Val Ile Thr Phe Glu Met Gly Gln Lys 100 105 110 Lys Ala Pro AspGly Arg Ile Val Glu Leu Val Lys Glu Val Ser Lys 115 120 125 Ser Tyr ProIle Ser Glu Gly Pro Glu Arg Ala Asn Glu Leu Val Glu 130 135 140 Ser TyrArg Lys Ala Ser Asn Lys Ala Tyr Phe Glu Arg Thr Ile Glu 145 150 155 160Ala Arg Asp Leu Ser Leu Leu Gly Ser His Val Arg Lys Ala Thr Tyr 165 170175 Gln Thr Tyr Ala Pro Ile Leu Tyr Glu Asn Asp His Phe Phe Asp Tyr 180185 190 Met Gln Lys Ser Lys Phe His Leu Thr Ile Glu Gly Pro Lys Val Leu195 200 205 Ala Tyr Leu Leu Gly Leu Trp Ile Gly Asp Ala Leu Ser Asp ArgAla 210 215 220 Thr Phe Ser Val Asp Ser Arg Asp Thr Ser Leu Met Glu ArgVal Thr 225 230 235 240 Glu Tyr Ala Glu Lys Leu Asn Leu Cys Ala Glu TyrLys Asp Arg Lys 245 250 255 Glu Pro Gln Val Ala Lys Thr Val Asn Leu TyrSer Lys Val Val Arg 260 265 270 Gly Asn Gly Ile Arg Asn Asn Leu Asn ThrGlu Asn Pro Leu Trp Asp 275 280 285 Ala Ile Val Gly Leu Gly Phe Leu LysAsp Gly Val Lys Asn Ile Pro 290 295 300 Ser Phe Leu Ser Thr Asp Asn IleGly Thr Arg Glu Thr Phe Leu Ala 305 310 315 320 Gly Leu Ile Asp Ser AspGly Tyr Val Thr Asp Glu His Gly Ile Lys 325 330 335 Ala Thr Ile Lys ThrIle His Thr Ser Val Arg Asp Gly Leu Val Ser 340 345 350 Leu Ala Arg SerLeu Gly Leu Val Val Ser Val Asn Ala Glu Pro Ala 355 360 365 Lys Val AspMet Asn Gly Thr Lys His Lys Ile Ser Tyr Ala Ile Tyr 370 375 380 Met SerGly Gly Asp Val Leu Leu Asn Val Leu Ser Lys Cys Ala Gly 385 390 395 400Ser Lys Lys Phe Arg Pro Ala Pro Ala Ala Ala Phe Ala Arg Glu Cys 405 410415 Arg Gly Phe Tyr Phe Glu Leu Gln Glu Leu Lys Glu Asp Asp Tyr Tyr 420425 430 Gly Ile Thr Leu Ser Asp Asp Ser Asp His Gln Phe Leu Leu Ala Asn435 440 445 Gln Val Val Val His Asn 450 11 454 PRT Artificial SequenceDescription of Artificial Sequence Synthetic VMA allele mutation 11 CysPhe Ala Lys Gly Thr Asn Val Leu Met Ala Asp Gly Ser Ile Glu 1 5 10 15Cys Ile Glu Asn Ile Glu Val Gly Asn Lys Val Met Gly Lys Asp Gly 20 25 30Arg Pro Arg Glu Val Ile Lys Leu Pro Arg Gly Arg Glu Thr Met Tyr 35 40 45Ser Val Val Gln Lys Ser Gln His Arg Ala His Lys Ser Asp Ser Ser 50 55 60Arg Glu Val Pro Glu Leu Leu Lys Phe Thr Cys Asn Ala Thr His Glu 65 70 7580 Leu Val Val Arg Thr Pro Arg Ser Val Arg Arg Leu Ser Arg Thr Ile 85 9095 Lys Gly Val Glu Tyr Phe Glu Val Ile Thr Phe Glu Met Gly Gln Lys 100105 110 Lys Ala Pro Asp Gly Arg Ile Val Glu Leu Val Lys Glu Val Ser Lys115 120 125 Ser Tyr Pro Ile Ser Glu Gly Pro Glu Arg Ala Asn Glu Leu ValGlu 130 135 140 Ser Tyr Arg Lys Ala Ser Asn Lys Ala Tyr Phe Glu Trp ThrIle Glu 145 150 155 160 Ala Arg Asp Leu Ser Leu Leu Gly Ser His Val ArgLys Ala Thr Tyr 165 170 175 Gln Thr Tyr Ala Pro Ile Leu Tyr Glu Asn AspHis Phe Phe Asp Tyr 180 185 190 Met Gln Lys Ser Lys Phe His Leu Thr IleGlu Gly Pro Lys Val Leu 195 200 205 Ala Tyr Leu Leu Gly Leu Trp Ile GlyAsp Gly Leu Ser Asp Arg Ala 210 215 220 Thr Phe Ser Val Asp Ser Arg AspThr Ser Leu Met Glu Arg Val Thr 225 230 235 240 Glu Tyr Ala Glu Lys LeuAsn Leu Cys Ala Glu Tyr Lys Asp Arg Lys 245 250 255 Glu Pro Gln Val AlaLys Thr Val Asn Leu Tyr Ser Lys Val Val Arg 260 265 270 Gly Asn Gly IleArg Asn Asn Leu Asn Thr Glu Asn Pro Leu Trp Asp 275 280 285 Ala Ile ValGly Leu Gly Phe Leu Lys Asp Gly Val Lys Asn Ile Pro 290 295 300 Ser PheLeu Ser Thr Asp Asn Ile Gly Thr Arg Glu Thr Phe Leu Ala 305 310 315 320Gly Leu Ile Asp Ser Asp Gly Tyr Val Thr Asp Glu His Gly Ile Lys 325 330335 Ala Thr Ile Lys Thr Ile His Thr Ser Val Arg Asp Gly Leu Val Ser 340345 350 Leu Ala Arg Ser Leu Gly Leu Val Val Ser Val Asn Ala Glu Pro Ala355 360 365 Lys Val Asp Met Asn Gly Thr Lys His Lys Ile Ser Tyr Ala IleTyr 370 375 380 Met Ser Gly Gly Asp Val Leu Leu Asn Val Leu Ser Lys CysAla Gly 385 390 395 400 Ser Lys Lys Phe Arg Pro Ala Pro Ala Ala Ala PheAla Arg Glu Cys 405 410 415 Arg Gly Phe Tyr Phe Glu Leu Gln Glu Leu LysGlu Asp Asp Tyr Tyr 420 425 430 Gly Ile Thr Leu Ser Asp Asp Ser Asp HisGln Phe Leu Leu Ala Asn 435 440 445 Gln Val Asn Val His Asn 450 12 454PRT Artificial Sequence Description of Artificial Sequence Synthetic VMAallele mutation 12 Cys Phe Ala Lys Gly Thr Asn Val Leu Met Ala Asp GlySer Ile Glu 1 5 10 15 Cys Ile Glu Asn Ile Glu Val Gly Asn Lys Val MetGly Lys Asp Gly 20 25 30 Arg Pro Arg Glu Val Ile Lys Leu Pro Arg Gly ArgGlu Thr Met Tyr 35 40 45 Ser Val Val Gln Lys Ser Gln His Arg Ala His LysSer Asp Ser Ser 50 55 60 Arg Glu Val Pro Glu Leu Leu Lys Phe Thr Cys AsnAla Thr His Glu 65 70 75 80 Leu Val Val Arg Thr Pro Arg Ser Val Arg ArgLeu Ser Arg Thr Ile 85 90 95 Lys Gly Val Glu Tyr Phe Glu Val Ile Thr PheGlu Met Gly Gln Lys 100 105 110 Lys Ala Pro Asp Gly Arg Ile Val Glu LeuVal Lys Glu Val Ser Lys 115 120 125 Ser Tyr Pro Ile Ser Glu Gly Pro GluArg Ala Asn Glu Leu Val Glu 130 135 140 Ser Tyr Arg Lys Ala Ser Asn LysAla Tyr Phe Glu Trp Thr Ile Glu 145 150 155 160 Ala Arg Asp Leu Ser LeuLeu Gly Ser His Val Arg Lys Ala Thr Tyr 165 170 175 Gln Thr Tyr Ala ProIle Leu Tyr Glu Asn Asp His Phe Phe Asp Tyr 180 185 190 Met Gln Lys SerLys Phe His Leu Thr Ile Glu Gly Pro Lys Val Leu 195 200 205 Ala Tyr LeuLeu Gly Leu Trp Ile Gly Asp Gly Leu Ser Asp Arg Ala 210 215 220 Thr PheSer Val Asp Ser Arg Asp Thr Ser Leu Met Glu Arg Val Thr 225 230 235 240Glu Tyr Ala Glu Lys Leu Asn Leu Cys Ala Glu Tyr Lys Asp Arg Lys 245 250255 Glu Pro Gln Val Ala Lys Thr Val Asn Leu Tyr Ser Lys Val Val Arg 260265 270 Gly Asn Gly Ile Arg Asn Asn Leu Asn Thr Glu Asn Pro Leu Trp Asp275 280 285 Ala Ile Val Gly Leu Gly Phe Leu Lys Asp Gly Val Lys Asn IlePro 290 295 300 Ser Phe Leu Ser Thr Asp Asn Ile Gly Thr Arg Glu Thr PheLeu Ala 305 310 315 320 Gly Leu Ile Asp Ser Asp Gly Tyr Val Thr Asp GluHis Gly Ile Lys 325 330 335 Ala Thr Ile Lys Thr Ile His Thr Ser Val ArgAsp Gly Leu Val Ser 340 345 350 Leu Ala Arg Ser Leu Gly Leu Val Val SerVal Asn Ala Glu Pro Ala 355 360 365 Lys Val Asp Met Asn Gly Thr Lys HisLys Ile Ser Tyr Ala Ile Tyr 370 375 380 Met Ser Gly Gly Asp Val Leu LeuAsn Val Leu Ser Lys Cys Ala Gly 385 390 395 400 Ser Lys Lys Phe Arg ProAla Pro Ala Ala Ala Phe Ala Arg Glu Cys 405 410 415 Arg Gly Phe Tyr PheGlu Leu Gln Glu Leu Lys Glu Asp Asp Tyr Tyr 420 425 430 Gly Ile Thr LeuSer Asp Asp Ser Asp His Gln Phe Leu Leu Ala Asn 435 440 445 Gln Val ThrGly His Asn 450 13 3096 DNA Saccharomyces cerevisiae CDS (1)..(3093) 13atg att ggt tgt gcc atg tac gaa ttg gtc aag gtc ggt cac gat aac 48 MetIle Gly Cys Ala Met Tyr Glu Leu Val Lys Val Gly His Asp Asn 1 5 10 15ctg gtg ggt gaa gtc att aga att gac ggt gac aag gcc acc atc caa 96 LeuVal Gly Glu Val Ile Arg Ile Asp Gly Asp Lys Ala Thr Ile Gln 20 25 30 gtttac gaa gaa act gca ggc ctt acg gtc ggt gac cct gtt ttg aga 144 Val TyrGlu Glu Thr Ala Gly Leu Thr Val Gly Asp Pro Val Leu Arg 35 40 45 aca ggtaag cct ctg tcg gta gaa ttg ggt cct ggt ctg atg gaa acc 192 Thr Gly LysPro Leu Ser Val Glu Leu Gly Pro Gly Leu Met Glu Thr 50 55 60 att tac gatggt att caa aga cct ttg aaa gcc att aag gaa gaa tcg 240 Ile Tyr Asp GlyIle Gln Arg Pro Leu Lys Ala Ile Lys Glu Glu Ser 65 70 75 80 caa tcg atttat atc cca aga ggt att gac act cca gct ttg gat agg 288 Gln Ser Ile TyrIle Pro Arg Gly Ile Asp Thr Pro Ala Leu Asp Arg 85 90 95 act atc aag tggcaa ttt act ccg gga aag ttt caa gtc ggc gat cat 336 Thr Ile Lys Trp GlnPhe Thr Pro Gly Lys Phe Gln Val Gly Asp His 100 105 110 att tcc ggt ggtgat att tac ggt tcc gtt ttt gag aat tcg cta att 384 Ile Ser Gly Gly AspIle Tyr Gly Ser Val Phe Glu Asn Ser Leu Ile 115 120 125 tca agc cat aagatt ctt ttg cca cca aga tca aga ggt aca atc act 432 Ser Ser His Lys IleLeu Leu Pro Pro Arg Ser Arg Gly Thr Ile Thr 130 135 140 tgg att gct ccagct ggt gag tac act ttg gat gag aag att ttg gaa 480 Trp Ile Ala Pro AlaGly Glu Tyr Thr Leu Asp Glu Lys Ile Leu Glu 145 150 155 160 gtt gaa tttgat ggc aag aag tct gat ttc act ctt tac cat act tgg 528 Val Glu Phe AspGly Lys Lys Ser Asp Phe Thr Leu Tyr His Thr Trp 165 170 175 cct gtt cgtgtt cca aga cca gtt act gaa aag tta tct gct gac tat 576 Pro Val Arg ValPro Arg Pro Val Thr Glu Lys Leu Ser Ala Asp Tyr 180 185 190 cct ttg ttaaca ggt caa aga gtt ttg gat gct ttg ttt cct tgt gtt 624 Pro Leu Leu ThrGly Gln Arg Val Leu Asp Ala Leu Phe Pro Cys Val 195 200 205 caa ggt ggtacg aca tgt att cca ggt gct ttt ggt tgt ggt aag acc 672 Gln Gly Gly ThrThr Cys Ile Pro Gly Ala Phe Gly Cys Gly Lys Thr 210 215 220 gtt atc tctcaa tct ttg tcc aag tac tcc aat tct gac gcc att atc 720 Val Ile Ser GlnSer Leu Ser Lys Tyr Ser Asn Ser Asp Ala Ile Ile 225 230 235 240 tat gtcggg tgc ttt gcc aag ggt acc aat gtt tta atg gcg gat ggg 768 Tyr Val GlyCys Phe Ala Lys Gly Thr Asn Val Leu Met Ala Asp Gly 245 250 255 tct attgaa tgt att gaa aac att gag gtt ggt aat aag gtc atg ggt 816 Ser Ile GluCys Ile Glu Asn Ile Glu Val Gly Asn Lys Val Met Gly 260 265 270 aaa gatggc aga cct cgt gag gta att aaa ttg ccc aga gga aga gaa 864 Lys Asp GlyArg Pro Arg Glu Val Ile Lys Leu Pro Arg Gly Arg Glu 275 280 285 act atgtac agc gtc gtg cag aaa agt cag cac aga gcc cac aaa agt 912 Thr Met TyrSer Val Val Gln Lys Ser Gln His Arg Ala His Lys Ser 290 295 300 gac tcaagt cgt gaa gtg cca gaa tta ctc aag ttt acg tgt aat gcg 960 Asp Ser SerArg Glu Val Pro Glu Leu Leu Lys Phe Thr Cys Asn Ala 305 310 315 320 acccat gag ttg gtt gtt aga aca cct cgt agt gtc cgc cgt ttg tct 1008 Thr HisGlu Leu Val Val Arg Thr Pro Arg Ser Val Arg Arg Leu Ser 325 330 335 cgtacc att aag ggt gtc gaa tat ttt gaa gtt att act ttt gag atg 1056 Arg ThrIle Lys Gly Val Glu Tyr Phe Glu Val Ile Thr Phe Glu Met 340 345 350 ggccaa aag aaa gcc ccc gac ggt aga att gtt gag ctt gtc aag gaa 1104 Gly GlnLys Lys Ala Pro Asp Gly Arg Ile Val Glu Leu Val Lys Glu 355 360 365 gtttca aag agc tac cca ata tct gag ggg cct gag aga gcc aac gaa 1152 Val SerLys Ser Tyr Pro Ile Ser Glu Gly Pro Glu Arg Ala Asn Glu 370 375 380 ttagta gaa tcc tat aga aag gct tca aat aaa gct tat ttt gag tgg 1200 Leu ValGlu Ser Tyr Arg Lys Ala Ser Asn Lys Ala Tyr Phe Glu Trp 385 390 395 400act att gag gcc aga gat ctt tct ctg ttg ggt tcc cat gtt cgt aaa 1248 ThrIle Glu Ala Arg Asp Leu Ser Leu Leu Gly Ser His Val Arg Lys 405 410 415gct acc tac cag act tac gct cca att ctt tat gag aat gac cac ttt 1296 AlaThr Tyr Gln Thr Tyr Ala Pro Ile Leu Tyr Glu Asn Asp His Phe 420 425 430ttc gac tac atg caa aaa agt aag ttt cat ctc acc att gaa ggt cca 1344 PheAsp Tyr Met Gln Lys Ser Lys Phe His Leu Thr Ile Glu Gly Pro 435 440 445aaa gta ctt gct tat tta ctt ggt tta tgg att ggt gat gga ttg tct 1392 LysVal Leu Ala Tyr Leu Leu Gly Leu Trp Ile Gly Asp Gly Leu Ser 450 455 460gac agg gca act ttt tcg gtt gat tcc aga gat act tct ttg atg gaa 1440 AspArg Ala Thr Phe Ser Val Asp Ser Arg Asp Thr Ser Leu Met Glu 465 470 475480 cgt gtt act gaa tat gct gaa aag ttg aat ttg tgc gcc gag tat aag 1488Arg Val Thr Glu Tyr Ala Glu Lys Leu Asn Leu Cys Ala Glu Tyr Lys 485 490495 gac aga aaa gaa cca caa gtt gcc aaa act gtt aat ttg tac tct aaa 1536Asp Arg Lys Glu Pro Gln Val Ala Lys Thr Val Asn Leu Tyr Ser Lys 500 505510 gtt gtc aga ggt aat ggt att cgc aat aat ctt aat act gag aat cca 1584Val Val Arg Gly Asn Gly Ile Arg Asn Asn Leu Asn Thr Glu Asn Pro 515 520525 tta tgg gac gct att gtt ggc tta gga ttc ttg aag gac ggt gtc aaa 1632Leu Trp Asp Ala Ile Val Gly Leu Gly Phe Leu Lys Asp Gly Val Lys 530 535540 aat att cct tct ttc ttg tct acg gac aat atc ggt act cgt gaa aca 1680Asn Ile Pro Ser Phe Leu Ser Thr Asp Asn Ile Gly Thr Arg Glu Thr 545 550555 560 ttt ctt gct ggt cta att gat tct gat ggc tat gtt act gat gag cat1728 Phe Leu Ala Gly Leu Ile Asp Ser Asp Gly Tyr Val Thr Asp Glu His 565570 575 ggt att aaa gca aca ata aag aca att cat act tct gtc aga gat ggt1776 Gly Ile Lys Ala Thr Ile Lys Thr Ile His Thr Ser Val Arg Asp Gly 580585 590 ttg gtt tcc ctt gct cgt tct tta ggc tta gta gtc tcg gtt aac gca1824 Leu Val Ser Leu Ala Arg Ser Leu Gly Leu Val Val Ser Val Asn Ala 595600 605 gaa cct gct aag gtt gac atg aat ggc acc aaa cat aaa att agt tat1872 Glu Pro Ala Lys Val Asp Met Asn Gly Thr Lys His Lys Ile Ser Tyr 610615 620 gct att tat atg tct ggt gga gat gtt ttg ctt aac gtt ctt tcg aag1920 Ala Ile Tyr Met Ser Gly Gly Asp Val Leu Leu Asn Val Leu Ser Lys 625630 635 640 tgt gcc ggc tct aaa aaa ttc agg cct gct ccc gcc gct gct tttgca 1968 Cys Ala Gly Ser Lys Lys Phe Arg Pro Ala Pro Ala Ala Ala Phe Ala645 650 655 cgt gag tgc cgc gga ttt tat ttc gag tta caa gaa ttg aag gaagac 2016 Arg Glu Cys Arg Gly Phe Tyr Phe Glu Leu Gln Glu Leu Lys Glu Asp660 665 670 gat tat tat ggg att act tta tct gat gat tct gat cat cag tttttg 2064 Asp Tyr Tyr Gly Ile Thr Leu Ser Asp Asp Ser Asp His Gln Phe Leu675 680 685 ctt gcc aac cag gtt gtc gtc cat aat tgc gga gaa aga ggt aatgaa 2112 Leu Ala Asn Gln Val Val Val His Asn Cys Gly Glu Arg Gly Asn Glu690 695 700 atg gca gaa gtc ttg atg gaa ttc cca gag tta tat act gaa atgagc 2160 Met Ala Glu Val Leu Met Glu Phe Pro Glu Leu Tyr Thr Glu Met Ser705 710 715 720 ggt act aaa gaa cca att atg aag cgt act act ttg gtc gctaat aca 2208 Gly Thr Lys Glu Pro Ile Met Lys Arg Thr Thr Leu Val Ala AsnThr 725 730 735 tct aac atg ccg gtt gca gcc aga gaa gct tct att tac actggt atc 2256 Ser Asn Met Pro Val Ala Ala Arg Glu Ala Ser Ile Tyr Thr GlyIle 740 745 750 act ctt gca gaa tac ttc aga gat caa ggt aaa aat gtt tctatg att 2304 Thr Leu Ala Glu Tyr Phe Arg Asp Gln Gly Lys Asn Val Ser MetIle 755 760 765 gca gac tct tct tca aga tgg gct gaa gct ttg aga gaa atttct ggt 2352 Ala Asp Ser Ser Ser Arg Trp Ala Glu Ala Leu Arg Glu Ile SerGly 770 775 780 cgt ttg ggt gag atg cct gct gat caa ggt ttc cca gct tatttg ggt 2400 Arg Leu Gly Glu Met Pro Ala Asp Gln Gly Phe Pro Ala Tyr LeuGly 785 790 795 800 gct aag ttg gcc tcc ttt tac gaa aga gcc ggt aaa gctgtt gct tta 2448 Ala Lys Leu Ala Ser Phe Tyr Glu Arg Ala Gly Lys Ala ValAla Leu 805 810 815 ggt tcc cca gat cgt act ggt tcc gtt tcc atc gtt gctgcc gtt tcg 2496 Gly Ser Pro Asp Arg Thr Gly Ser Val Ser Ile Val Ala AlaVal Ser 820 825 830 cca gcc gat ggt gat ttc tca gat cct gtt act act gctaca ttg ggt 2544 Pro Ala Asp Gly Asp Phe Ser Asp Pro Val Thr Thr Ala ThrLeu Gly 835 840 845 atc act caa gtc ttt tgg ggt tta gac aag aaa ttg gctcaa aga aag 2592 Ile Thr Gln Val Phe Trp Gly Leu Asp Lys Lys Leu Ala GlnArg Lys 850 855 860 cat ttc cca tct atc aac aca tct gtt tct tac tcc aaatac act aat 2640 His Phe Pro Ser Ile Asn Thr Ser Val Ser Tyr Ser Lys TyrThr Asn 865 870 875 880 gtc ttg aac aag ttt tat gat tcc aat tac cct gaattt cct gtt tta 2688 Val Leu Asn Lys Phe Tyr Asp Ser Asn Tyr Pro Glu PhePro Val Leu 885 890 895 aga gat cgt atg aag gaa att cta tca aac gct gaagaa tta gaa caa 2736 Arg Asp Arg Met Lys Glu Ile Leu Ser Asn Ala Glu GluLeu Glu Gln 900 905 910 gtt gtt caa tta gtt ggt aaa tcg gcc ttg tct gatagt gat aag att 2784 Val Val Gln Leu Val Gly Lys Ser Ala Leu Ser Asp SerAsp Lys Ile 915 920 925 act ttg gat gtt gcc act tta atc aag gaa gat ttcttg caa caa aat 2832 Thr Leu Asp Val Ala Thr Leu Ile Lys Glu Asp Phe LeuGln Gln Asn 930 935 940 ggt tac tcc act tat gat gct ttc tgt cca att tggaag aca ttt gat 2880 Gly Tyr Ser Thr Tyr Asp Ala Phe Cys Pro Ile Trp LysThr Phe Asp 945 950 955 960 atg atg aga gcc ttc atc tcg tat cat gac gaagct caa aaa gct gtt 2928 Met Met Arg Ala Phe Ile Ser Tyr His Asp Glu AlaGln Lys Ala Val 965 970 975 gct aat ggt gcc aac tgg tca aaa cta gct gactct act ggt gac gtt 2976 Ala Asn Gly Ala Asn Trp Ser Lys Leu Ala Asp SerThr Gly Asp Val 980 985 990 aag cat gcc gtt tct tca tct aaa ttt ttt gaacca agc agg ggt gaa 3024 Lys His Ala Val Ser Ser Ser Lys Phe Phe Glu ProSer Arg Gly Glu 995 1000 1005 aag gaa gtc cat ggc gaa ttc gaa aaa ttgttg agc act atg caa gaa 3072 Lys Glu Val His Gly Glu Phe Glu Lys Leu LeuSer Thr Met Gln Glu 1010 1015 1020 aga ttt gct gaa tct acc gat taa 3096Arg Phe Ala Glu Ser Thr Asp 1025 1030 14 1031 PRT Saccharomycescerevisiae 14 Met Ile Gly Cys Ala Met Tyr Glu Leu Val Lys Val Gly HisAsp Asn 1 5 10 15 Leu Val Gly Glu Val Ile Arg Ile Asp Gly Asp Lys AlaThr Ile Gln 20 25 30 Val Tyr Glu Glu Thr Ala Gly Leu Thr Val Gly Asp ProVal Leu Arg 35 40 45 Thr Gly Lys Pro Leu Ser Val Glu Leu Gly Pro Gly LeuMet Glu Thr 50 55 60 Ile Tyr Asp Gly Ile Gln Arg Pro Leu Lys Ala Ile LysGlu Glu Ser 65 70 75 80 Gln Ser Ile Tyr Ile Pro Arg Gly Ile Asp Thr ProAla Leu Asp Arg 85 90 95 Thr Ile Lys Trp Gln Phe Thr Pro Gly Lys Phe GlnVal Gly Asp His 100 105 110 Ile Ser Gly Gly Asp Ile Tyr Gly Ser Val PheGlu Asn Ser Leu Ile 115 120 125 Ser Ser His Lys Ile Leu Leu Pro Pro ArgSer Arg Gly Thr Ile Thr 130 135 140 Trp Ile Ala Pro Ala Gly Glu Tyr ThrLeu Asp Glu Lys Ile Leu Glu 145 150 155 160 Val Glu Phe Asp Gly Lys LysSer Asp Phe Thr Leu Tyr His Thr Trp 165 170 175 Pro Val Arg Val Pro ArgPro Val Thr Glu Lys Leu Ser Ala Asp Tyr 180 185 190 Pro Leu Leu Thr GlyGln Arg Val Leu Asp Ala Leu Phe Pro Cys Val 195 200 205 Gln Gly Gly ThrThr Cys Ile Pro Gly Ala Phe Gly Cys Gly Lys Thr 210 215 220 Val Ile SerGln Ser Leu Ser Lys Tyr Ser Asn Ser Asp Ala Ile Ile 225 230 235 240 TyrVal Gly Cys Phe Ala Lys Gly Thr Asn Val Leu Met Ala Asp Gly 245 250 255Ser Ile Glu Cys Ile Glu Asn Ile Glu Val Gly Asn Lys Val Met Gly 260 265270 Lys Asp Gly Arg Pro Arg Glu Val Ile Lys Leu Pro Arg Gly Arg Glu 275280 285 Thr Met Tyr Ser Val Val Gln Lys Ser Gln His Arg Ala His Lys Ser290 295 300 Asp Ser Ser Arg Glu Val Pro Glu Leu Leu Lys Phe Thr Cys AsnAla 305 310 315 320 Thr His Glu Leu Val Val Arg Thr Pro Arg Ser Val ArgArg Leu Ser 325 330 335 Arg Thr Ile Lys Gly Val Glu Tyr Phe Glu Val IleThr Phe Glu Met 340 345 350 Gly Gln Lys Lys Ala Pro Asp Gly Arg Ile ValGlu Leu Val Lys Glu 355 360 365 Val Ser Lys Ser Tyr Pro Ile Ser Glu GlyPro Glu Arg Ala Asn Glu 370 375 380 Leu Val Glu Ser Tyr Arg Lys Ala SerAsn Lys Ala Tyr Phe Glu Trp 385 390 395 400 Thr Ile Glu Ala Arg Asp LeuSer Leu Leu Gly Ser His Val Arg Lys 405 410 415 Ala Thr Tyr Gln Thr TyrAla Pro Ile Leu Tyr Glu Asn Asp His Phe 420 425 430 Phe Asp Tyr Met GlnLys Ser Lys Phe His Leu Thr Ile Glu Gly Pro 435 440 445 Lys Val Leu AlaTyr Leu Leu Gly Leu Trp Ile Gly Asp Gly Leu Ser 450 455 460 Asp Arg AlaThr Phe Ser Val Asp Ser Arg Asp Thr Ser Leu Met Glu 465 470 475 480 ArgVal Thr Glu Tyr Ala Glu Lys Leu Asn Leu Cys Ala Glu Tyr Lys 485 490 495Asp Arg Lys Glu Pro Gln Val Ala Lys Thr Val Asn Leu Tyr Ser Lys 500 505510 Val Val Arg Gly Asn Gly Ile Arg Asn Asn Leu Asn Thr Glu Asn Pro 515520 525 Leu Trp Asp Ala Ile Val Gly Leu Gly Phe Leu Lys Asp Gly Val Lys530 535 540 Asn Ile Pro Ser Phe Leu Ser Thr Asp Asn Ile Gly Thr Arg GluThr 545 550 555 560 Phe Leu Ala Gly Leu Ile Asp Ser Asp Gly Tyr Val ThrAsp Glu His 565 570 575 Gly Ile Lys Ala Thr Ile Lys Thr Ile His Thr SerVal Arg Asp Gly 580 585 590 Leu Val Ser Leu Ala Arg Ser Leu Gly Leu ValVal Ser Val Asn Ala 595 600 605 Glu Pro Ala Lys Val Asp Met Asn Gly ThrLys His Lys Ile Ser Tyr 610 615 620 Ala Ile Tyr Met Ser Gly Gly Asp ValLeu Leu Asn Val Leu Ser Lys 625 630 635 640 Cys Ala Gly Ser Lys Lys PheArg Pro Ala Pro Ala Ala Ala Phe Ala 645 650 655 Arg Glu Cys Arg Gly PheTyr Phe Glu Leu Gln Glu Leu Lys Glu Asp 660 665 670 Asp Tyr Tyr Gly IleThr Leu Ser Asp Asp Ser Asp His Gln Phe Leu 675 680 685 Leu Ala Asn GlnVal Val Val His Asn Cys Gly Glu Arg Gly Asn Glu 690 695 700 Met Ala GluVal Leu Met Glu Phe Pro Glu Leu Tyr Thr Glu Met Ser 705 710 715 720 GlyThr Lys Glu Pro Ile Met Lys Arg Thr Thr Leu Val Ala Asn Thr 725 730 735Ser Asn Met Pro Val Ala Ala Arg Glu Ala Ser Ile Tyr Thr Gly Ile 740 745750 Thr Leu Ala Glu Tyr Phe Arg Asp Gln Gly Lys Asn Val Ser Met Ile 755760 765 Ala Asp Ser Ser Ser Arg Trp Ala Glu Ala Leu Arg Glu Ile Ser Gly770 775 780 Arg Leu Gly Glu Met Pro Ala Asp Gln Gly Phe Pro Ala Tyr LeuGly 785 790 795 800 Ala Lys Leu Ala Ser Phe Tyr Glu Arg Ala Gly Lys AlaVal Ala Leu 805 810 815 Gly Ser Pro Asp Arg Thr Gly Ser Val Ser Ile ValAla Ala Val Ser 820 825 830 Pro Ala Asp Gly Asp Phe Ser Asp Pro Val ThrThr Ala Thr Leu Gly 835 840 845 Ile Thr Gln Val Phe Trp Gly Leu Asp LysLys Leu Ala Gln Arg Lys 850 855 860 His Phe Pro Ser Ile Asn Thr Ser ValSer Tyr Ser Lys Tyr Thr Asn 865 870 875 880 Val Leu Asn Lys Phe Tyr AspSer Asn Tyr Pro Glu Phe Pro Val Leu 885 890 895 Arg Asp Arg Met Lys GluIle Leu Ser Asn Ala Glu Glu Leu Glu Gln 900 905 910 Val Val Gln Leu ValGly Lys Ser Ala Leu Ser Asp Ser Asp Lys Ile 915 920 925 Thr Leu Asp ValAla Thr Leu Ile Lys Glu Asp Phe Leu Gln Gln Asn 930 935 940 Gly Tyr SerThr Tyr Asp Ala Phe Cys Pro Ile Trp Lys Thr Phe Asp 945 950 955 960 MetMet Arg Ala Phe Ile Ser Tyr His Asp Glu Ala Gln Lys Ala Val 965 970 975Ala Asn Gly Ala Asn Trp Ser Lys Leu Ala Asp Ser Thr Gly Asp Val 980 985990 Lys His Ala Val Ser Ser Ser Lys Phe Phe Glu Pro Ser Arg Gly Glu 9951000 1005 Lys Glu Val His Gly Glu Phe Glu Lys Leu Leu Ser Thr Met GlnGlu 1010 1015 1020 Arg Phe Ala Glu Ser Thr Asp 1025 1030 15 3147 DNACandida tropicalis CDS (1)..(3144) 15 atg att gga tgt gcc atg tac gaattg gtt aaa gtt ggt cat gat aat 48 Met Ile Gly Cys Ala Met Tyr Glu LeuVal Lys Val Gly His Asp Asn 1 5 10 15 tta gtt ggg gaa gtt att aga attaat ggt gat aaa gca acc att caa 96 Leu Val Gly Glu Val Ile Arg Ile AsnGly Asp Lys Ala Thr Ile Gln 20 25 30 gtt tat gaa gaa act gca ggg gtc actgtt ggt gat cca gtt tta aga 144 Val Tyr Glu Glu Thr Ala Gly Val Thr ValGly Asp Pro Val Leu Arg 35 40 45 act ggt aaa cca tta tct gtt gaa tta ggtcct ggt tta atg gaa act 192 Thr Gly Lys Pro Leu Ser Val Glu Leu Gly ProGly Leu Met Glu Thr 50 55 60 att tat gat ggt att caa aga cct tta aaa gccatt aaa gat gaa tcc 240 Ile Tyr Asp Gly Ile Gln Arg Pro Leu Lys Ala IleLys Asp Glu Ser 65 70 75 80 caa tct att tat atc cca aga ggt att gat gttcct gct tta tca aga 288 Gln Ser Ile Tyr Ile Pro Arg Gly Ile Asp Val ProAla Leu Ser Arg 85 90 95 act gtt caa tat gat ttc act cca ggt caa ttg aaagtt ggt gat cat 336 Thr Val Gln Tyr Asp Phe Thr Pro Gly Gln Leu Lys ValGly Asp His 100 105 110 atc act ggt ggg gac att ttt ggt tct att tat gaaaac tct tta ttg 384 Ile Thr Gly Gly Asp Ile Phe Gly Ser Ile Tyr Glu AsnSer Leu Leu 115 120 125 gat gac cat aag att ttg tta cct cca aga gca agaggt act att act 432 Asp Asp His Lys Ile Leu Leu Pro Pro Arg Ala Arg GlyThr Ile Thr 130 135 140 tct att gct gaa gcc ggt tct tat aat gtt gaa gaacca gtt ttg gaa 480 Ser Ile Ala Glu Ala Gly Ser Tyr Asn Val Glu Glu ProVal Leu Glu 145 150 155 160 gtt gaa ttt gat ggt aag aaa cat aaa tac tctatg atg cat aca tgg 528 Val Glu Phe Asp Gly Lys Lys His Lys Tyr Ser MetMet His Thr Trp 165 170 175 cca gtt aga gtt cca aga cca gtt gct gaa aaattg act gct gat cat 576 Pro Val Arg Val Pro Arg Pro Val Ala Glu Lys LeuThr Ala Asp His 180 185 190 cca ttg ttg acc ggt caa aga gtc ttg gat tcttta ttc cca tgt gtt 624 Pro Leu Leu Thr Gly Gln Arg Val Leu Asp Ser LeuPhe Pro Cys Val 195 200 205 caa ggt ggt act act tgt atc cca ggg gct tttggt tgt ggt aaa act 672 Gln Gly Gly Thr Thr Cys Ile Pro Gly Ala Phe GlyCys Gly Lys Thr 210 215 220 gtt att tct caa tct ttg tcc aaa ttc tcc aactct gat gtt att atc 720 Val Ile Ser Gln Ser Leu Ser Lys Phe Ser Asn SerAsp Val Ile Ile 225 230 235 240 tat gtt ggt tgt ttc act aaa ggt act caagtc atg atg gct gat ggt 768 Tyr Val Gly Cys Phe Thr Lys Gly Thr Gln ValMet Met Ala Asp Gly 245 250 255 gcc gac aaa tct att gaa tct att gaa gttggt gac aaa gtc atg ggt 816 Ala Asp Lys Ser Ile Glu Ser Ile Glu Val GlyAsp Lys Val Met Gly 260 265 270 aaa gat ggt atg cca aga gaa gtt gtt ggctta cca aga ggt tat gat 864 Lys Asp Gly Met Pro Arg Glu Val Val Gly LeuPro Arg Gly Tyr Asp 275 280 285 gat atg tac aag gtt cgt caa ctt tct agtact aga cgt aat gct aaa 912 Asp Met Tyr Lys Val Arg Gln Leu Ser Ser ThrArg Arg Asn Ala Lys 290 295 300 tcc gaa ggc ttg atg gat ttc act gtt tctgct gat cat aaa ctt atc 960 Ser Glu Gly Leu Met Asp Phe Thr Val Ser AlaAsp His Lys Leu Ile 305 310 315 320 ttg aaa act aaa caa gat gtc aag attgct aca cgt aaa att ggt ggc 1008 Leu Lys Thr Lys Gln Asp Val Lys Ile AlaThr Arg Lys Ile Gly Gly 325 330 335 aac acc tat act ggt gtt act ttc tatgtt ttg gaa aag act aag act 1056 Asn Thr Tyr Thr Gly Val Thr Phe Tyr ValLeu Glu Lys Thr Lys Thr 340 345 350 ggt att gaa tta gtt aaa gcc aag actaaa gtt ttc ggt cat cat atc 1104 Gly Ile Glu Leu Val Lys Ala Lys Thr LysVal Phe Gly His His Ile 355 360 365 cat ggt caa aat ggc gct gaa gaa aaagct gct act ttt gct gct ggc 1152 His Gly Gln Asn Gly Ala Glu Glu Lys AlaAla Thr Phe Ala Ala Gly 370 375 380 att gac tct aaa gaa tac att gat tggatc att gaa gct aga gat tat 1200 Ile Asp Ser Lys Glu Tyr Ile Asp Trp IleIle Glu Ala Arg Asp Tyr 385 390 395 400 gta caa gtt gat gaa att gtc aagacc agc acc act caa atg atc aac 1248 Val Gln Val Asp Glu Ile Val Lys ThrSer Thr Thr Gln Met Ile Asn 405 410 415 cca gtt cat ttt gaa tct ggt aaactc ggt aac tgg tta cac gaa cac 1296 Pro Val His Phe Glu Ser Gly Lys LeuGly Asn Trp Leu His Glu His 420 425 430 aag caa aac aaa tca ctt gct ccacaa ttg ggt tac ttg ttg ggt act 1344 Lys Gln Asn Lys Ser Leu Ala Pro GlnLeu Gly Tyr Leu Leu Gly Thr 435 440 445 tgg gct ggt att gga aat gtt aaatct tct gct ttc acc atg aac tcc 1392 Trp Ala Gly Ile Gly Asn Val Lys SerSer Ala Phe Thr Met Asn Ser 450 455 460 aaa gat gat gtt aaa tta gct acaaga att atg aac tac tct tca aaa 1440 Lys Asp Asp Val Lys Leu Ala Thr ArgIle Met Asn Tyr Ser Ser Lys 465 470 475 480 ttg ggc atg act tgt tct tctact gaa tcc ggt gaa ctc aat gtc gct 1488 Leu Gly Met Thr Cys Ser Ser ThrGlu Ser Gly Glu Leu Asn Val Ala 485 490 495 gaa aac gaa gaa gaa ttt ttcaat aac ctt ggt gct gaa aag gat gaa 1536 Glu Asn Glu Glu Glu Phe Phe AsnAsn Leu Gly Ala Glu Lys Asp Glu 500 505 510 gct ggt gat ttc act ttt gatgaa ttt acc gat gct atg gat gaa ttg 1584 Ala Gly Asp Phe Thr Phe Asp GluPhe Thr Asp Ala Met Asp Glu Leu 515 520 525 act atc aat gtt cat ggt gcagct gca agc aag aag aac aat ttg ttg 1632 Thr Ile Asn Val His Gly Ala AlaAla Ser Lys Lys Asn Asn Leu Leu 530 535 540 tgg aat gct ttg aaa tct cttggt ttc aga gcc aag tct act gat att 1680 Trp Asn Ala Leu Lys Ser Leu GlyPhe Arg Ala Lys Ser Thr Asp Ile 545 550 555 560 gtc aag agt att cct caacat att gct gtt gat gat att gtt gtc aga 1728 Val Lys Ser Ile Pro Gln HisIle Ala Val Asp Asp Ile Val Val Arg 565 570 575 gaa tct ttg att gcc ggttta gtt gat gct gct ggt aat gtt gaa acc 1776 Glu Ser Leu Ile Ala Gly LeuVal Asp Ala Ala Gly Asn Val Glu Thr 580 585 590 aaa tcc aat ggt tct attgaa gct gtt gtt aga act tct ttc aga cat 1824 Lys Ser Asn Gly Ser Ile GluAla Val Val Arg Thr Ser Phe Arg His 595 600 605 gtc gct aga ggt ctt gtcaag att gct cat tct ttg ggt att gaa tca 1872 Val Ala Arg Gly Leu Val LysIle Ala His Ser Leu Gly Ile Glu Ser 610 615 620 tct att aat att aaa gatact cac att gat gct gct ggt gtt aga caa 1920 Ser Ile Asn Ile Lys Asp ThrHis Ile Asp Ala Ala Gly Val Arg Gln 625 630 635 640 gaa ttt gct tgt attgtc aat ttg act ggt gct cca ctt gct ggt gtt 1968 Glu Phe Ala Cys Ile ValAsn Leu Thr Gly Ala Pro Leu Ala Gly Val 645 650 655 ctt tct aaa tgt gcactt gca aga aac caa act cca gtt gtc aaa ttt 2016 Leu Ser Lys Cys Ala LeuAla Arg Asn Gln Thr Pro Val Val Lys Phe 660 665 670 acc aga gac cca gttttg ttc aac ttt gat ttg atc aaa tct gca aaa 2064 Thr Arg Asp Pro Val LeuPhe Asn Phe Asp Leu Ile Lys Ser Ala Lys 675 680 685 gaa aac tat tat ggtatt act ttg gct gaa gaa act gat cat caa ttc 2112 Glu Asn Tyr Tyr Gly IleThr Leu Ala Glu Glu Thr Asp His Gln Phe 690 695 700 ctt tta tcc aac atggcc ttg gtg cac aac tgt ggt gaa cgt ggt aat 2160 Leu Leu Ser Asn Met AlaLeu Val His Asn Cys Gly Glu Arg Gly Asn 705 710 715 720 gag atg gct gaagtt ttg atg gaa ttc cca gaa ttg ttt act gaa att 2208 Glu Met Ala Glu ValLeu Met Glu Phe Pro Glu Leu Phe Thr Glu Ile 725 730 735 tct ggt aga aaagaa cca att atg aaa cgt acc act ttg gtt gcc aat 2256 Ser Gly Arg Lys GluPro Ile Met Lys Arg Thr Thr Leu Val Ala Asn 740 745 750 act tct aat atgcca gtc gct gcc aga gaa gct tct att tat act ggt 2304 Thr Ser Asn Met ProVal Ala Ala Arg Glu Ala Ser Ile Tyr Thr Gly 755 760 765 att aca ttg gctgaa tat ttc aga gat caa ggt aag aat gtt tct atg 2352 Ile Thr Leu Ala GluTyr Phe Arg Asp Gln Gly Lys Asn Val Ser Met 770 775 780 att gct gat tcttct tca cgt tgg gct gaa gct ttg aga gaa att tct 2400 Ile Ala Asp Ser SerSer Arg Trp Ala Glu Ala Leu Arg Glu Ile Ser 785 790 795 800 ggt aga ttgggt gaa atg cct gct gat caa ggt ttc cca gct tat ttg 2448 Gly Arg Leu GlyGlu Met Pro Ala Asp Gln Gly Phe Pro Ala Tyr Leu 805 810 815 ggt gct aaattg gct tct ttc tat gag cgt gcc ggt aaa gcc act gct 2496 Gly Ala Lys LeuAla Ser Phe Tyr Glu Arg Ala Gly Lys Ala Thr Ala 820 825 830 ttg ggt tcacca gat aga gtt ggt tca gtt tct att gtt gct gct gtt 2544 Leu Gly Ser ProAsp Arg Val Gly Ser Val Ser Ile Val Ala Ala Val 835 840 845 tct cca gctggt ggt gat ttc tct gat cca gtt act act tct act ttg 2592 Ser Pro Ala GlyGly Asp Phe Ser Asp Pro Val Thr Thr Ser Thr Leu 850 855 860 ggt att actcaa gtt ttc tgg ggg ttg gat aag aaa ttg gcc caa aga 2640 Gly Ile Thr GlnVal Phe Trp Gly Leu Asp Lys Lys Leu Ala Gln Arg 865 870 875 880 aaa catttc cca tct att aac acc agt gtt tct tat tct aaa tac acc 2688 Lys His PhePro Ser Ile Asn Thr Ser Val Ser Tyr Ser Lys Tyr Thr 885 890 895 aat gttttg aac aaa tac tat gat tcc aac tat cca gaa ttc cca caa 2736 Asn Val LeuAsn Lys Tyr Tyr Asp Ser Asn Tyr Pro Glu Phe Pro Gln 900 905 910 ttg agagac aaa att aga gaa att tta tct aat gct gaa gaa ttg gaa 2784 Leu Arg AspLys Ile Arg Glu Ile Leu Ser Asn Ala Glu Glu Leu Glu 915 920 925 caa gttgtt caa tta gtt ggt aaa tct gca ttg tct gat tct gat aag 2832 Gln Val ValGln Leu Val Gly Lys Ser Ala Leu Ser Asp Ser Asp Lys 930 935 940 att acttta gat gtt gct acc ttg att aaa gaa gat ttc ttg caa caa 2880 Ile Thr LeuAsp Val Ala Thr Leu Ile Lys Glu Asp Phe Leu Gln Gln 945 950 955 960 aatggt tat tct tca tat gat gca ttc tgt cca att tgg aag act ttt 2928 Asn GlyTyr Ser Ser Tyr Asp Ala Phe Cys Pro Ile Trp Lys Thr Phe 965 970 975 gatatg atg aga gca ttt att tca tat tat gat gaa gca caa aaa gca 2976 Asp MetMet Arg Ala Phe Ile Ser Tyr Tyr Asp Glu Ala Gln Lys Ala 980 985 990 attgcc aat ggt gct caa tgg tct aaa tta gct gaa agt act agt gat 3024 Ile AlaAsn Gly Ala Gln Trp Ser Lys Leu Ala Glu Ser Thr Ser Asp 995 1000 1005gtt aaa cat gct gtt tct tca gct aaa ttc ttt gaa cca tca aga ggt 3072 ValLys His Ala Val Ser Ser Ala Lys Phe Phe Glu Pro Ser Arg Gly 1010 10151020 caa aaa gaa ggt gaa aaa gaa ttt gga gat tta tta acc act atc tcc3120 Gln Lys Glu Gly Glu Lys Glu Phe Gly Asp Leu Leu Thr Thr Ile Ser1025 1030 1035 1040 gaa aga ttt gct gaa gct tca gaa taa 3147 Glu Arg PheAla Glu Ala Ser Glu 1045 16 1048 PRT Candida tropicalis 16 Met Ile GlyCys Ala Met Tyr Glu Leu Val Lys Val Gly His Asp Asn 1 5 10 15 Leu ValGly Glu Val Ile Arg Ile Asn Gly Asp Lys Ala Thr Ile Gln 20 25 30 Val TyrGlu Glu Thr Ala Gly Val Thr Val Gly Asp Pro Val Leu Arg 35 40 45 Thr GlyLys Pro Leu Ser Val Glu Leu Gly Pro Gly Leu Met Glu Thr 50 55 60 Ile TyrAsp Gly Ile Gln Arg Pro Leu Lys Ala Ile Lys Asp Glu Ser 65 70 75 80 GlnSer Ile Tyr Ile Pro Arg Gly Ile Asp Val Pro Ala Leu Ser Arg 85 90 95 ThrVal Gln Tyr Asp Phe Thr Pro Gly Gln Leu Lys Val Gly Asp His 100 105 110Ile Thr Gly Gly Asp Ile Phe Gly Ser Ile Tyr Glu Asn Ser Leu Leu 115 120125 Asp Asp His Lys Ile Leu Leu Pro Pro Arg Ala Arg Gly Thr Ile Thr 130135 140 Ser Ile Ala Glu Ala Gly Ser Tyr Asn Val Glu Glu Pro Val Leu Glu145 150 155 160 Val Glu Phe Asp Gly Lys Lys His Lys Tyr Ser Met Met HisThr Trp 165 170 175 Pro Val Arg Val Pro Arg Pro Val Ala Glu Lys Leu ThrAla Asp His 180 185 190 Pro Leu Leu Thr Gly Gln Arg Val Leu Asp Ser LeuPhe Pro Cys Val 195 200 205 Gln Gly Gly Thr Thr Cys Ile Pro Gly Ala PheGly Cys Gly Lys Thr 210 215 220 Val Ile Ser Gln Ser Leu Ser Lys Phe SerAsn Ser Asp Val Ile Ile 225 230 235 240 Tyr Val Gly Cys Phe Thr Lys GlyThr Gln Val Met Met Ala Asp Gly 245 250 255 Ala Asp Lys Ser Ile Glu SerIle Glu Val Gly Asp Lys Val Met Gly 260 265 270 Lys Asp Gly Met Pro ArgGlu Val Val Gly Leu Pro Arg Gly Tyr Asp 275 280 285 Asp Met Tyr Lys ValArg Gln Leu Ser Ser Thr Arg Arg Asn Ala Lys 290 295 300 Ser Glu Gly LeuMet Asp Phe Thr Val Ser Ala Asp His Lys Leu Ile 305 310 315 320 Leu LysThr Lys Gln Asp Val Lys Ile Ala Thr Arg Lys Ile Gly Gly 325 330 335 AsnThr Tyr Thr Gly Val Thr Phe Tyr Val Leu Glu Lys Thr Lys Thr 340 345 350Gly Ile Glu Leu Val Lys Ala Lys Thr Lys Val Phe Gly His His Ile 355 360365 His Gly Gln Asn Gly Ala Glu Glu Lys Ala Ala Thr Phe Ala Ala Gly 370375 380 Ile Asp Ser Lys Glu Tyr Ile Asp Trp Ile Ile Glu Ala Arg Asp Tyr385 390 395 400 Val Gln Val Asp Glu Ile Val Lys Thr Ser Thr Thr Gln MetIle Asn 405 410 415 Pro Val His Phe Glu Ser Gly Lys Leu Gly Asn Trp LeuHis Glu His 420 425 430 Lys Gln Asn Lys Ser Leu Ala Pro Gln Leu Gly TyrLeu Leu Gly Thr 435 440 445 Trp Ala Gly Ile Gly Asn Val Lys Ser Ser AlaPhe Thr Met Asn Ser 450 455 460 Lys Asp Asp Val Lys Leu Ala Thr Arg IleMet Asn Tyr Ser Ser Lys 465 470 475 480 Leu Gly Met Thr Cys Ser Ser ThrGlu Ser Gly Glu Leu Asn Val Ala 485 490 495 Glu Asn Glu Glu Glu Phe PheAsn Asn Leu Gly Ala Glu Lys Asp Glu 500 505 510 Ala Gly Asp Phe Thr PheAsp Glu Phe Thr Asp Ala Met Asp Glu Leu 515 520 525 Thr Ile Asn Val HisGly Ala Ala Ala Ser Lys Lys Asn Asn Leu Leu 530 535 540 Trp Asn Ala LeuLys Ser Leu Gly Phe Arg Ala Lys Ser Thr Asp Ile 545 550 555 560 Val LysSer Ile Pro Gln His Ile Ala Val Asp Asp Ile Val Val Arg 565 570 575 GluSer Leu Ile Ala Gly Leu Val Asp Ala Ala Gly Asn Val Glu Thr 580 585 590Lys Ser Asn Gly Ser Ile Glu Ala Val Val Arg Thr Ser Phe Arg His 595 600605 Val Ala Arg Gly Leu Val Lys Ile Ala His Ser Leu Gly Ile Glu Ser 610615 620 Ser Ile Asn Ile Lys Asp Thr His Ile Asp Ala Ala Gly Val Arg Gln625 630 635 640 Glu Phe Ala Cys Ile Val Asn Leu Thr Gly Ala Pro Leu AlaGly Val 645 650 655 Leu Ser Lys Cys Ala Leu Ala Arg Asn Gln Thr Pro ValVal Lys Phe 660 665 670 Thr Arg Asp Pro Val Leu Phe Asn Phe Asp Leu IleLys Ser Ala Lys 675 680 685 Glu Asn Tyr Tyr Gly Ile Thr Leu Ala Glu GluThr Asp His Gln Phe 690 695 700 Leu Leu Ser Asn Met Ala Leu Val His AsnCys Gly Glu Arg Gly Asn 705 710 715 720 Glu Met Ala Glu Val Leu Met GluPhe Pro Glu Leu Phe Thr Glu Ile 725 730 735 Ser Gly Arg Lys Glu Pro IleMet Lys Arg Thr Thr Leu Val Ala Asn 740 745 750 Thr Ser Asn Met Pro ValAla Ala Arg Glu Ala Ser Ile Tyr Thr Gly 755 760 765 Ile Thr Leu Ala GluTyr Phe Arg Asp Gln Gly Lys Asn Val Ser Met 770 775 780 Ile Ala Asp SerSer Ser Arg Trp Ala Glu Ala Leu Arg Glu Ile Ser 785 790 795 800 Gly ArgLeu Gly Glu Met Pro Ala Asp Gln Gly Phe Pro Ala Tyr Leu 805 810 815 GlyAla Lys Leu Ala Ser Phe Tyr Glu Arg Ala Gly Lys Ala Thr Ala 820 825 830Leu Gly Ser Pro Asp Arg Val Gly Ser Val Ser Ile Val Ala Ala Val 835 840845 Ser Pro Ala Gly Gly Asp Phe Ser Asp Pro Val Thr Thr Ser Thr Leu 850855 860 Gly Ile Thr Gln Val Phe Trp Gly Leu Asp Lys Lys Leu Ala Gln Arg865 870 875 880 Lys His Phe Pro Ser Ile Asn Thr Ser Val Ser Tyr Ser LysTyr Thr 885 890 895 Asn Val Leu Asn Lys Tyr Tyr Asp Ser Asn Tyr Pro GluPhe Pro Gln 900 905 910 Leu Arg Asp Lys Ile Arg Glu Ile Leu Ser Asn AlaGlu Glu Leu Glu 915 920 925 Gln Val Val Gln Leu Val Gly Lys Ser Ala LeuSer Asp Ser Asp Lys 930 935 940 Ile Thr Leu Asp Val Ala Thr Leu Ile LysGlu Asp Phe Leu Gln Gln 945 950 955 960 Asn Gly Tyr Ser Ser Tyr Asp AlaPhe Cys Pro Ile Trp Lys Thr Phe 965 970 975 Asp Met Met Arg Ala Phe IleSer Tyr Tyr Asp Glu Ala Gln Lys Ala 980 985 990 Ile Ala Asn Gly Ala GlnTrp Ser Lys Leu Ala Glu Ser Thr Ser Asp 995 1000 1005 Val Lys His AlaVal Ser Ser Ala Lys Phe Phe Glu Pro Ser Arg Gly 1010 1015 1020 Gln LysGlu Gly Glu Lys Glu Phe Gly Asp Leu Leu Thr Thr Ile Ser 1025 1030 10351040 Glu Arg Phe Ala Glu Ala Ser Glu 1045 17 3033 DNA Chlamydomonaseugametos CDS (1)..(3030) 17 atg cct att ggt gtt cca cgt att att tat tgctgg gga gaa gaa ctt 48 Met Pro Ile Gly Val Pro Arg Ile Ile Tyr Cys TrpGly Glu Glu Leu 1 5 10 15 ccc gca caa tgg act gat att tat aac ttt attttt aga cgt cga atg 96 Pro Ala Gln Trp Thr Asp Ile Tyr Asn Phe Ile PheArg Arg Arg Met 20 25 30 gtc ttt tta atg caa tat ttg gat gat gaa ctt tgtaat caa atc tgt 144 Val Phe Leu Met Gln Tyr Leu Asp Asp Glu Leu Cys AsnGln Ile Cys 35 40 45 ggt tta tta att aat att cat atg gaa gac cgt tca aaagaa ttg gaa 192 Gly Leu Leu Ile Asn Ile His Met Glu Asp Arg Ser Lys GluLeu Glu 50 55 60 aaa aaa gaa att gaa cgt agt ggt tta ttc aaa gga ggt ccaaaa aca 240 Lys Lys Glu Ile Glu Arg Ser Gly Leu Phe Lys Gly Gly Pro LysThr 65 70 75 80 caa aaa ggt ggg aca ggt gcc ggc gaa aca ggt gca tca agtatt caa 288 Gln Lys Gly Gly Thr Gly Ala Gly Glu Thr Gly Ala Ser Ser IleGln 85 90 95 aat aaa aaa agc aat agt tca tca ttt gaa gat tta tta gct gcagat 336 Asn Lys Lys Ser Asn Ser Ser Ser Phe Glu Asp Leu Leu Ala Ala Asp100 105 110 gag gat tta ggt att gat gaa aat aat aca tta gaa caa tat acactt 384 Glu Asp Leu Gly Ile Asp Glu Asn Asn Thr Leu Glu Gln Tyr Thr Leu115 120 125 caa aaa att aca atg gaa tgg tta aat tgg aat gct caa ttt tttgat 432 Gln Lys Ile Thr Met Glu Trp Leu Asn Trp Asn Ala Gln Phe Phe Asp130 135 140 tat tca gat gaa cct tat ctt ttt tat tta gcc gaa atg cta tcaaaa 480 Tyr Ser Asp Glu Pro Tyr Leu Phe Tyr Leu Ala Glu Met Leu Ser Lys145 150 155 160 gat ttt aat aaa gga gat gct cgt atg tta ttt tca aat aataat aaa 528 Asp Phe Asn Lys Gly Asp Ala Arg Met Leu Phe Ser Asn Asn AsnLys 165 170 175 ttt tca atg cca ttt tct caa atg ctt aat aca gga tcg atgtcc gat 576 Phe Ser Met Pro Phe Ser Gln Met Leu Asn Thr Gly Ser Met SerAsp 180 185 190 cca cgt cgc cca cag tct acg aac ggg gct aat tgg aat tcaagt gaa 624 Pro Arg Arg Pro Gln Ser Thr Asn Gly Ala Asn Trp Asn Ser SerGlu 195 200 205 caa aat aat tct tta gac att tat tct cct ttc cgt atg ttagct aat 672 Gln Asn Asn Ser Leu Asp Ile Tyr Ser Pro Phe Arg Met Leu AlaAsn 210 215 220 ttt gaa gcc caa gat tat gat ttt aaa caa att aat cca tcttta gct 720 Phe Glu Ala Gln Asp Tyr Asp Phe Lys Gln Ile Asn Pro Ser LeuAla 225 230 235 240 tca aaa gaa gaa gtt ttc aaa ctt ttt aat aat act atttta aaa aat 768 Ser Lys Glu Glu Val Phe Lys Leu Phe Asn Asn Thr Ile LeuLys Asn 245 250 255 gga ggt caa cgt aat aat aat atg tcc aaa tta tta acagaa tta gca 816 Gly Gly Gln Arg Asn Asn Asn Met Ser Lys Leu Leu Thr GluLeu Ala 260 265 270 caa cgt aat tgg gaa aat aaa aca aat tca caa gaa aattta tat aaa 864 Gln Arg Asn Trp Glu Asn Lys Thr Asn Ser Gln Glu Asn LeuTyr Lys 275 280 285 agc aca gaa aaa gct ttg agt caa cgt aat tta cga aaagaa tat att 912 Ser Thr Glu Lys Ala Leu Ser Gln Arg Asn Leu Arg Lys GluTyr Ile 290 295 300 aaa gac cgt act tta aat aat tat tca agt gac ccg tttaat aca aaa 960 Lys Asp Arg Thr Leu Asn Asn Tyr Ser Ser Asp Pro Phe AsnThr Lys 305 310 315 320 ggc tac gtc aac gca caa ggt gcg tcg acg ggg ccaagc cct cgt aca 1008 Gly Tyr Val Asn Ala Gln Gly Ala Ser Thr Gly Pro SerPro Arg Thr 325 330 335 cgt ggt atg cat gcc gac gga tcc tta aat tat ttagat ttc tat tct 1056 Arg Gly Met His Ala Asp Gly Ser Leu Asn Tyr Leu AspPhe Tyr Ser 340 345 350 tat aat gat tct tat aat gat ttc aaa act gca cctcgt gga aaa caa 1104 Tyr Asn Asp Ser Tyr Asn Asp Phe Lys Thr Ala Pro ArgGly Lys Gln 355 360 365 gct gaa cgt gcc ttc caa gaa gag gaa tct aaa aaagtt ttt gtt att 1152 Ala Glu Arg Ala Phe Gln Glu Glu Glu Ser Lys Lys ValPhe Val Ile 370 375 380 att aac tcg ttt ggt ggt tct gtt ggt aat ggg attact gtg cat gat 1200 Ile Asn Ser Phe Gly Gly Ser Val Gly Asn Gly Ile ThrVal His Asp 385 390 395 400 gca ctt caa ttt att aaa gct ggg tca tta acatta gct tta ggt gtt 1248 Ala Leu Gln Phe Ile Lys Ala Gly Ser Leu Thr LeuAla Leu Gly Val 405 410 415 gca gct tcc gcc gct tca tta gcc ctt gct ggtggt act att ggt gag 1296 Ala Ala Ser Ala Ala Ser Leu Ala Leu Ala Gly GlyThr Ile Gly Glu 420 425 430 cgt tat gtt acg gaa ggt tgc cat gtt atg attcac caa cca gaa tgc 1344 Arg Tyr Val Thr Glu Gly Cys His Val Met Ile HisGln Pro Glu Cys 435 440 445 ttg act tct gac cac act gta tta aca act cgcggt tgg att cct att 1392 Leu Thr Ser Asp His Thr Val Leu Thr Thr Arg GlyTrp Ile Pro Ile 450 455 460 gct gac gta act ctt gat gac aaa gta gcg gtttta gat aac aat aca 1440 Ala Asp Val Thr Leu Asp Asp Lys Val Ala Val LeuAsp Asn Asn Thr 465 470 475 480 ggt gaa atg tca tat caa aat cca caa aaagta cat aaa tat gac tat 1488 Gly Glu Met Ser Tyr Gln Asn Pro Gln Lys ValHis Lys Tyr Asp Tyr 485 490 495 gaa ggt cca atg tat gaa gta aaa aca gctgga gtt gac tta ttt gtt 1536 Glu Gly Pro Met Tyr Glu Val Lys Thr Ala GlyVal Asp Leu Phe Val 500 505 510 aca cca aac cac cgt atg tat gtt aac acaacg aat aat act acg aac 1584 Thr Pro Asn His Arg Met Tyr Val Asn Thr ThrAsn Asn Thr Thr Asn 515 520 525 caa aac tat aat tta gtt gaa gct tca tctatt ttt gga aaa aaa gta 1632 Gln Asn Tyr Asn Leu Val Glu Ala Ser Ser IlePhe Gly Lys Lys Val 530 535 540 cgt tac aaa aat gat gct atc tgg aat aaaacc gat tat caa ttt att 1680 Arg Tyr Lys Asn Asp Ala Ile Trp Asn Lys ThrAsp Tyr Gln Phe Ile 545 550 555 560 tta cct gaa act gca acg ctt aca ggtcat aca aat aaa ata agc tct 1728 Leu Pro Glu Thr Ala Thr Leu Thr Gly HisThr Asn Lys Ile Ser Ser 565 570 575 aca cct gcc atc caa ccc gaa atg aacgct tgg cta act ttc ttt gga 1776 Thr Pro Ala Ile Gln Pro Glu Met Asn AlaTrp Leu Thr Phe Phe Gly 580 585 590 tta tgg atc gct aac gga cat act acgaaa att gct gaa aaa aca gca 1824 Leu Trp Ile Ala Asn Gly His Thr Thr LysIle Ala Glu Lys Thr Ala 595 600 605 gaa aat aat caa caa aaa caa cga tataag gta att ctg act caa gtt 1872 Glu Asn Asn Gln Gln Lys Gln Arg Tyr LysVal Ile Leu Thr Gln Val 610 615 620 aaa gaa gat gtt tgt gat att att gaacaa act tta aat aaa tta gga 1920 Lys Glu Asp Val Cys Asp Ile Ile Glu GlnThr Leu Asn Lys Leu Gly 625 630 635 640 ttt aat ttt att cgt agt ggt aaagat tac aca att gaa aat aaa caa 1968 Phe Asn Phe Ile Arg Ser Gly Lys AspTyr Thr Ile Glu Asn Lys Gln 645 650 655 cta tgg tct tac tta aat cct ttcgat aac ggg gct tta aat aaa tat 2016 Leu Trp Ser Tyr Leu Asn Pro Phe AspAsn Gly Ala Leu Asn Lys Tyr 660 665 670 tta cct gat tgg gta tgg gaa ttaagt tca caa caa tgt aaa att tta 2064 Leu Pro Asp Trp Val Trp Glu Leu SerSer Gln Gln Cys Lys Ile Leu 675 680 685 tta aat agc tta tgt ctt ggt aattgt ctt ttc act aaa aac gat gac 2112 Leu Asn Ser Leu Cys Leu Gly Asn CysLeu Phe Thr Lys Asn Asp Asp 690 695 700 act tta cat tat ttt agt acg tcagaa cgt ttt gca aat gat gtt agc 2160 Thr Leu His Tyr Phe Ser Thr Ser GluArg Phe Ala Asn Asp Val Ser 705 710 715 720 cgt ttg gcc tta cat gcc ggaaca act tcg act att caa tta gaa gca 2208 Arg Leu Ala Leu His Ala Gly ThrThr Ser Thr Ile Gln Leu Glu Ala 725 730 735 gct cca agt aat cta tat gataca att att ggt cta cct gtt gaa gta 2256 Ala Pro Ser Asn Leu Tyr Asp ThrIle Ile Gly Leu Pro Val Glu Val 740 745 750 aac act act cta tgg cgt gtaatt att aat caa agt agt ttc tac tct 2304 Asn Thr Thr Leu Trp Arg Val IleIle Asn Gln Ser Ser Phe Tyr Ser 755 760 765 tat tcc act gac aaa tca agcgca cta aat tta tct aat aat gta gca 2352 Tyr Ser Thr Asp Lys Ser Ser AlaLeu Asn Leu Ser Asn Asn Val Ala 770 775 780 tgc tac gtc aac gcg cag agcgcg ttg acg tta gaa caa aat tct caa 2400 Cys Tyr Val Asn Ala Gln Ser AlaLeu Thr Leu Glu Gln Asn Ser Gln 785 790 795 800 aaa atc aat aaa aat acttta gtt tta aca aaa aat aac gta aaa agt 2448 Lys Ile Asn Lys Asn Thr LeuVal Leu Thr Lys Asn Asn Val Lys Ser 805 810 815 caa aca atg cat agt caacgc gca gag cgc gtt gac acg gct ctt tta 2496 Gln Thr Met His Ser Gln ArgAla Glu Arg Val Asp Thr Ala Leu Leu 820 825 830 act caa aaa gag ctt gataac tca tta aat cat gaa att tta att aat 2544 Thr Gln Lys Glu Leu Asp AsnSer Leu Asn His Glu Ile Leu Ile Asn 835 840 845 aaa aac cct ggt act agtcaa tta gaa tgt gta gtt aac cct gaa gtt 2592 Lys Asn Pro Gly Thr Ser GlnLeu Glu Cys Val Val Asn Pro Glu Val 850 855 860 aat aac aca tca act aatgat cgt ttt gtt tac tac aaa ggg cca gta 2640 Asn Asn Thr Ser Thr Asn AspArg Phe Val Tyr Tyr Lys Gly Pro Val 865 870 875 880 tat tgc tta act ggtcct aac aac gta ttc tac gta caa cga aac gga 2688 Tyr Cys Leu Thr Gly ProAsn Asn Val Phe Tyr Val Gln Arg Asn Gly 885 890 895 aaa gct gtg tgg acaggt aac agt tca att caa ggc caa gca tca gat 2736 Lys Ala Val Trp Thr GlyAsn Ser Ser Ile Gln Gly Gln Ala Ser Asp 900 905 910 att tgg att gat agtcaa gaa atc atg aaa att cgt tta gat gta gca 2784 Ile Trp Ile Asp Ser GlnGlu Ile Met Lys Ile Arg Leu Asp Val Ala 915 920 925 gaa att tat tca ttagct act tat cgt ccg cgt cac aaa att tta cgt 2832 Glu Ile Tyr Ser Leu AlaThr Tyr Arg Pro Arg His Lys Ile Leu Arg 930 935 940 gat tta gat cgt gatttt tat cta acg gca act gaa aca att cat tat 2880 Asp Leu Asp Arg Asp PheTyr Leu Thr Ala Thr Glu Thr Ile His Tyr 945 950 955 960 ggt tta gct gatgaa att gct tct aat gaa gta atg caa gaa att att 2928 Gly Leu Ala Asp GluIle Ala Ser Asn Glu Val Met Gln Glu Ile Ile 965 970 975 gaa atg aca agtaaa gtt tgg gac tat cat gat aca aaa caa caa cgt 2976 Glu Met Thr Ser LysVal Trp Asp Tyr His Asp Thr Lys Gln Gln Arg 980 985 990 tta cta gaa agtcgt gat tct aca act tct ggg gca gat aca caa tct 3024 Leu Leu Glu Ser ArgAsp Ser Thr Thr Ser Gly Ala Asp Thr Gln Ser 995 1000 1005 caa aat taa3033 Gln Asn 1010 18 1010 PRT Chlamydomonas eugametos 18 Met Pro Ile GlyVal Pro Arg Ile Ile Tyr Cys Trp Gly Glu Glu Leu 1 5 10 15 Pro Ala GlnTrp Thr Asp Ile Tyr Asn Phe Ile Phe Arg Arg Arg Met 20 25 30 Val Phe LeuMet Gln Tyr Leu Asp Asp Glu Leu Cys Asn Gln Ile Cys 35 40 45 Gly Leu LeuIle Asn Ile His Met Glu Asp Arg Ser Lys Glu Leu Glu 50 55 60 Lys Lys GluIle Glu Arg Ser Gly Leu Phe Lys Gly Gly Pro Lys Thr 65 70 75 80 Gln LysGly Gly Thr Gly Ala Gly Glu Thr Gly Ala Ser Ser Ile Gln 85 90 95 Asn LysLys Ser Asn Ser Ser Ser Phe Glu Asp Leu Leu Ala Ala Asp 100 105 110 GluAsp Leu Gly Ile Asp Glu Asn Asn Thr Leu Glu Gln Tyr Thr Leu 115 120 125Gln Lys Ile Thr Met Glu Trp Leu Asn Trp Asn Ala Gln Phe Phe Asp 130 135140 Tyr Ser Asp Glu Pro Tyr Leu Phe Tyr Leu Ala Glu Met Leu Ser Lys 145150 155 160 Asp Phe Asn Lys Gly Asp Ala Arg Met Leu Phe Ser Asn Asn AsnLys 165 170 175 Phe Ser Met Pro Phe Ser Gln Met Leu Asn Thr Gly Ser MetSer Asp 180 185 190 Pro Arg Arg Pro Gln Ser Thr Asn Gly Ala Asn Trp AsnSer Ser Glu 195 200 205 Gln Asn Asn Ser Leu Asp Ile Tyr Ser Pro Phe ArgMet Leu Ala Asn 210 215 220 Phe Glu Ala Gln Asp Tyr Asp Phe Lys Gln IleAsn Pro Ser Leu Ala 225 230 235 240 Ser Lys Glu Glu Val Phe Lys Leu PheAsn Asn Thr Ile Leu Lys Asn 245 250 255 Gly Gly Gln Arg Asn Asn Asn MetSer Lys Leu Leu Thr Glu Leu Ala 260 265 270 Gln Arg Asn Trp Glu Asn LysThr Asn Ser Gln Glu Asn Leu Tyr Lys 275 280 285 Ser Thr Glu Lys Ala LeuSer Gln Arg Asn Leu Arg Lys Glu Tyr Ile 290 295 300 Lys Asp Arg Thr LeuAsn Asn Tyr Ser Ser Asp Pro Phe Asn Thr Lys 305 310 315 320 Gly Tyr ValAsn Ala Gln Gly Ala Ser Thr Gly Pro Ser Pro Arg Thr 325 330 335 Arg GlyMet His Ala Asp Gly Ser Leu Asn Tyr Leu Asp Phe Tyr Ser 340 345 350 TyrAsn Asp Ser Tyr Asn Asp Phe Lys Thr Ala Pro Arg Gly Lys Gln 355 360 365Ala Glu Arg Ala Phe Gln Glu Glu Glu Ser Lys Lys Val Phe Val Ile 370 375380 Ile Asn Ser Phe Gly Gly Ser Val Gly Asn Gly Ile Thr Val His Asp 385390 395 400 Ala Leu Gln Phe Ile Lys Ala Gly Ser Leu Thr Leu Ala Leu GlyVal 405 410 415 Ala Ala Ser Ala Ala Ser Leu Ala Leu Ala Gly Gly Thr IleGly Glu 420 425 430 Arg Tyr Val Thr Glu Gly Cys His Val Met Ile His GlnPro Glu Cys 435 440 445 Leu Thr Ser Asp His Thr Val Leu Thr Thr Arg GlyTrp Ile Pro Ile 450 455 460 Ala Asp Val Thr Leu Asp Asp Lys Val Ala ValLeu Asp Asn Asn Thr 465 470 475 480 Gly Glu Met Ser Tyr Gln Asn Pro GlnLys Val His Lys Tyr Asp Tyr 485 490 495 Glu Gly Pro Met Tyr Glu Val LysThr Ala Gly Val Asp Leu Phe Val 500 505 510 Thr Pro Asn His Arg Met TyrVal Asn Thr Thr Asn Asn Thr Thr Asn 515 520 525 Gln Asn Tyr Asn Leu ValGlu Ala Ser Ser Ile Phe Gly Lys Lys Val 530 535 540 Arg Tyr Lys Asn AspAla Ile Trp Asn Lys Thr Asp Tyr Gln Phe Ile 545 550 555 560 Leu Pro GluThr Ala Thr Leu Thr Gly His Thr Asn Lys Ile Ser Ser 565 570 575 Thr ProAla Ile Gln Pro Glu Met Asn Ala Trp Leu Thr Phe Phe Gly 580 585 590 LeuTrp Ile Ala Asn Gly His Thr Thr Lys Ile Ala Glu Lys Thr Ala 595 600 605Glu Asn Asn Gln Gln Lys Gln Arg Tyr Lys Val Ile Leu Thr Gln Val 610 615620 Lys Glu Asp Val Cys Asp Ile Ile Glu Gln Thr Leu Asn Lys Leu Gly 625630 635 640 Phe Asn Phe Ile Arg Ser Gly Lys Asp Tyr Thr Ile Glu Asn LysGln 645 650 655 Leu Trp Ser Tyr Leu Asn Pro Phe Asp Asn Gly Ala Leu AsnLys Tyr 660 665 670 Leu Pro Asp Trp Val Trp Glu Leu Ser Ser Gln Gln CysLys Ile Leu 675 680 685 Leu Asn Ser Leu Cys Leu Gly Asn Cys Leu Phe ThrLys Asn Asp Asp 690 695 700 Thr Leu His Tyr Phe Ser Thr Ser Glu Arg PheAla Asn Asp Val Ser 705 710 715 720 Arg Leu Ala Leu His Ala Gly Thr ThrSer Thr Ile Gln Leu Glu Ala 725 730 735 Ala Pro Ser Asn Leu Tyr Asp ThrIle Ile Gly Leu Pro Val Glu Val 740 745 750 Asn Thr Thr Leu Trp Arg ValIle Ile Asn Gln Ser Ser Phe Tyr Ser 755 760 765 Tyr Ser Thr Asp Lys SerSer Ala Leu Asn Leu Ser Asn Asn Val Ala 770 775 780 Cys Tyr Val Asn AlaGln Ser Ala Leu Thr Leu Glu Gln Asn Ser Gln 785 790 795 800 Lys Ile AsnLys Asn Thr Leu Val Leu Thr Lys Asn Asn Val Lys Ser 805 810 815 Gln ThrMet His Ser Gln Arg Ala Glu Arg Val Asp Thr Ala Leu Leu 820 825 830 ThrGln Lys Glu Leu Asp Asn Ser Leu Asn His Glu Ile Leu Ile Asn 835 840 845Lys Asn Pro Gly Thr Ser Gln Leu Glu Cys Val Val Asn Pro Glu Val 850 855860 Asn Asn Thr Ser Thr Asn Asp Arg Phe Val Tyr Tyr Lys Gly Pro Val 865870 875 880 Tyr Cys Leu Thr Gly Pro Asn Asn Val Phe Tyr Val Gln Arg AsnGly 885 890 895 Lys Ala Val Trp Thr Gly Asn Ser Ser Ile Gln Gly Gln AlaSer Asp 900 905 910 Ile Trp Ile Asp Ser Gln Glu Ile Met Lys Ile Arg LeuAsp Val Ala 915 920 925 Glu Ile Tyr Ser Leu Ala Thr Tyr Arg Pro Arg HisLys Ile Leu Arg 930 935 940 Asp Leu Asp Arg Asp Phe Tyr Leu Thr Ala ThrGlu Thr Ile His Tyr 945 950 955 960 Gly Leu Ala Asp Glu Ile Ala Ser AsnGlu Val Met Gln Glu Ile Ile 965 970 975 Glu Met Thr Ser Lys Val Trp AspTyr His Asp Thr Lys Gln Gln Arg 980 985 990 Leu Leu Glu Ser Arg Asp SerThr Thr Ser Gly Ala Asp Thr Gln Ser 995 1000 1005 Gln Asn 1010 19 2373DNA Mycobacterium tuberculosis CDS (1)..(2370) 19 atg acg cag acc cccgat cgg gaa aag gcg ctc gag ctg gca gtg gcc 48 Met Thr Gln Thr Pro AspArg Glu Lys Ala Leu Glu Leu Ala Val Ala 1 5 10 15 cag atc gag aag agttac ggc aaa ggt tcg gtg atg cgc ctc ggc gac 96 Gln Ile Glu Lys Ser TyrGly Lys Gly Ser Val Met Arg Leu Gly Asp 20 25 30 gag gcg cgt cag ccg atttcg gtc att ccg acc gga tcc atc gca cta 144 Glu Ala Arg Gln Pro Ile SerVal Ile Pro Thr Gly Ser Ile Ala Leu 35 40 45 gac gtg gcc ctg ggc att ggcggc ctg ccg cgt ggc cgg gtg ata gag 192 Asp Val Ala Leu Gly Ile Gly GlyLeu Pro Arg Gly Arg Val Ile Glu 50 55 60 ata tac ggc ccg gag tcg tcg ggtaag acc acc gtg gcg ctg cac gcg 240 Ile Tyr Gly Pro Glu Ser Ser Gly LysThr Thr Val Ala Leu His Ala 65 70 75 80 gtg gcc aac gct cag gcc gcc ggtggt gtt gcg gcg ttc atc gac gcc 288 Val Ala Asn Ala Gln Ala Ala Gly GlyVal Ala Ala Phe Ile Asp Ala 85 90 95 gag cac gcg ctg gat ccg gac tat gccaag aag ctc ggt gtc gac acc 336 Glu His Ala Leu Asp Pro Asp Tyr Ala LysLys Leu Gly Val Asp Thr 100 105 110 gat tcg ctg ctg gtc agc cag ccg gacacc ggg gaa cag gca ctc gag 384 Asp Ser Leu Leu Val Ser Gln Pro Asp ThrGly Glu Gln Ala Leu Glu 115 120 125 atc gcc gac atg ctg atc cgc tcg ggtgcg ctt gac atc gtg gtg atc 432 Ile Ala Asp Met Leu Ile Arg Ser Gly AlaLeu Asp Ile Val Val Ile 130 135 140 gac tcg gtg gcg gcg ctg gtg ccg cgcgcg gag ctc gaa ggc gag atg 480 Asp Ser Val Ala Ala Leu Val Pro Arg AlaGlu Leu Glu Gly Glu Met 145 150 155 160 ggc gac agc cac gtc ggg ctg caggcc cgg ctg atg agc cag gcg ctg 528 Gly Asp Ser His Val Gly Leu Gln AlaArg Leu Met Ser Gln Ala Leu 165 170 175 cgg aaa atg acc ggc gcg ctg aataat tcg ggc acc acg gcg atc ttc 576 Arg Lys Met Thr Gly Ala Leu Asn AsnSer Gly Thr Thr Ala Ile Phe 180 185 190 atc aac cag ctc cgc gac aag atcgga gtg atg ttc ggg tcg ccc gag 624 Ile Asn Gln Leu Arg Asp Lys Ile GlyVal Met Phe Gly Ser Pro Glu 195 200 205 acg aca acg ggc gga aag gcg ttgaag ttc tac gcg tcg gtg cgc atg 672 Thr Thr Thr Gly Gly Lys Ala Leu LysPhe Tyr Ala Ser Val Arg Met 210 215 220 gac gtg cgg cga gtc gag acg ctcaag gac ggt acc aac gcg gtc ggc 720 Asp Val Arg Arg Val Glu Thr Leu LysAsp Gly Thr Asn Ala Val Gly 225 230 235 240 aac cgc acc cgg gtc aag gtcgtc aag aac aag tgc ctc gca gag ggc 768 Asn Arg Thr Arg Val Lys Val ValLys Asn Lys Cys Leu Ala Glu Gly 245 250 255 act cgg atc ttc gat ccg gtcacc ggt aca acg cat cgc atc gag gat 816 Thr Arg Ile Phe Asp Pro Val ThrGly Thr Thr His Arg Ile Glu Asp 260 265 270 gtt gtc gat ggg cgc aag cctatt cat gtc gtg gct gct gcc aag gac 864 Val Val Asp Gly Arg Lys Pro IleHis Val Val Ala Ala Ala Lys Asp 275 280 285 gga acg ctg cat gcg cgg cccgtg gtg tcc tgg ttc gac cag gga acg 912 Gly Thr Leu His Ala Arg Pro ValVal Ser Trp Phe Asp Gln Gly Thr 290 295 300 cgg gat gtg atc ggg ttg cggatc gcc ggt ggc gcc atc gtg tgg gcg 960 Arg Asp Val Ile Gly Leu Arg IleAla Gly Gly Ala Ile Val Trp Ala 305 310 315 320 aca ccc gat cac aag gtgctg aca gag tac ggc tgg cgt gcc gcc ggg 1008 Thr Pro Asp His Lys Val LeuThr Glu Tyr Gly Trp Arg Ala Ala Gly 325 330 335 gaa ctc cgc aag gga gacagg gtg gcg caa ccg cga cgc ttc gat gga 1056 Glu Leu Arg Lys Gly Asp ArgVal Ala Gln Pro Arg Arg Phe Asp Gly 340 345 350 ttc ggt gac agt gcg ccgatt ccg gcg gat cat gcc cgg ctg ctt ggc 1104 Phe Gly Asp Ser Ala Pro IlePro Ala Asp His Ala Arg Leu Leu Gly 355 360 365 tac ctg atc gga gat ggcagg gat ggt tgg gtg ggg ggc aag act ccg 1152 Tyr Leu Ile Gly Asp Gly ArgAsp Gly Trp Val Gly Gly Lys Thr Pro 370 375 380 atc aac ttc atc aat gttcag cgg gcg ctc att gac gac gtg acg cga 1200 Ile Asn Phe Ile Asn Val GlnArg Ala Leu Ile Asp Asp Val Thr Arg 385 390 395 400 atc gct gcg acg ctcggt tgc gcg gcc cat ccg cag ggg cgt atc tca 1248 Ile Ala Ala Thr Leu GlyCys Ala Ala His Pro Gln Gly Arg Ile Ser 405 410 415 ctc gcg atc gct catcga ccc ggt gag cgc aac ggt gtg gca gac ctt 1296 Leu Ala Ile Ala His ArgPro Gly Glu Arg Asn Gly Val Ala Asp Leu 420 425 430 tgt cag cag gcc ggtatc tac ggc aag ctc gcg tgg gag aag acg att 1344 Cys Gln Gln Ala Gly IleTyr Gly Lys Leu Ala Trp Glu Lys Thr Ile 435 440 445 ccg aat tgg ttc ttcgag ccg gac atc gcg gcc gac att gtc ggc aat 1392 Pro Asn Trp Phe Phe GluPro Asp Ile Ala Ala Asp Ile Val Gly Asn 450 455 460 ctg ctc ttc ggc ctgttc gaa agc gac ggg tgg gtg agc cgg gaa cag 1440 Leu Leu Phe Gly Leu PheGlu Ser Asp Gly Trp Val Ser Arg Glu Gln 465 470 475 480 acc ggg gca cttcgg gtc ggt tac acg acg acc tct gaa caa ctc gcg 1488 Thr Gly Ala Leu ArgVal Gly Tyr Thr Thr Thr Ser Glu Gln Leu Ala 485 490 495 cat cag att cattgg ctg ctg ctg cgg ttc ggt gtc ggg agc acc gtt 1536 His Gln Ile His TrpLeu Leu Leu Arg Phe Gly Val Gly Ser Thr Val 500 505 510 cga gat tac gatccg acc cag aag cgg ccg agc atc gtc aac ggt cga 1584 Arg Asp Tyr Asp ProThr Gln Lys Arg Pro Ser Ile Val Asn Gly Arg 515 520 525 cgg atc cag agcaaa cgt caa gtg ttc gag gtc cgg atc tcg ggt atg 1632 Arg Ile Gln Ser LysArg Gln Val Phe Glu Val Arg Ile Ser Gly Met 530 535 540 gat aac gtc acggca ttc gcg gag tca gtt ccc atg tgg ggg ccg cgc 1680 Asp Asn Val Thr AlaPhe Ala Glu Ser Val Pro Met Trp Gly Pro Arg 545 550 555 560 ggt gcc gcgctt atc cag gcg att cca gaa gcc acg cag ggg cgg cgt 1728 Gly Ala Ala LeuIle Gln Ala Ile Pro Glu Ala Thr Gln Gly Arg Arg 565 570 575 cgt gga tcgcaa gcg aca tat ctg gct gca gag atg acc gat gcc gtg 1776 Arg Gly Ser GlnAla Thr Tyr Leu Ala Ala Glu Met Thr Asp Ala Val 580 585 590 ctg aat tatctg gac gag cgc ggc gtg acc gcg cag gag gcc gcg gcc 1824 Leu Asn Tyr LeuAsp Glu Arg Gly Val Thr Ala Gln Glu Ala Ala Ala 595 600 605 atg atc ggtgta gct tcc ggg gac ccc cgc ggt gga atg aag cag gtc 1872 Met Ile Gly ValAla Ser Gly Asp Pro Arg Gly Gly Met Lys Gln Val 610 615 620 tta ggt gccagc cgc ctt cgt cgg gat cgc gtg cag gcg ctc gcg gat 1920 Leu Gly Ala SerArg Leu Arg Arg Asp Arg Val Gln Ala Leu Ala Asp 625 630 635 640 gcc ctggat gac aaa ttc ctg cac gac atg ctg gcg gaa gaa ctc cgc 1968 Ala Leu AspAsp Lys Phe Leu His Asp Met Leu Ala Glu Glu Leu Arg 645 650 655 tat tccgtg atc cga gaa gtg ctg cca acg cgg cgg gca cga acg ttc 2016 Tyr Ser ValIle Arg Glu Val Leu Pro Thr Arg Arg Ala Arg Thr Phe 660 665 670 gac ctcgag gtc gag gaa ctg cac acc ctc gtc gcc gaa ggg gtt gtc 2064 Asp Leu GluVal Glu Glu Leu His Thr Leu Val Ala Glu Gly Val Val 675 680 685 gtg cacaac tgt tcg ccc ccc ttc aag cag gcc gag ttc gac atc ctc 2112 Val His AsnCys Ser Pro Pro Phe Lys Gln Ala Glu Phe Asp Ile Leu 690 695 700 tac ggcaag gga atc agc agg gag ggc tcg ctg atc gac atg ggt gtg 2160 Tyr Gly LysGly Ile Ser Arg Glu Gly Ser Leu Ile Asp Met Gly Val 705 710 715 720 gatcag ggc ctc atc cgc aag tcg ggt gcc tgg ttc acc tac gag ggc 2208 Asp GlnGly Leu Ile Arg Lys Ser Gly Ala Trp Phe Thr Tyr Glu Gly 725 730 735 gagcag ctc ggc cag ggc aag gag aat gcc cgc aac ttc ttg gtg gag 2256 Glu GlnLeu Gly Gln Gly Lys Glu Asn Ala Arg Asn Phe Leu Val Glu 740 745 750 aacgcc gac gtg gct gac gag atc gag aag aag atc aag gaa aag ctt 2304 Asn AlaAsp Val Ala Asp Glu Ile Glu Lys Lys Ile Lys Glu Lys Leu 755 760 765 ggcatt ggt gcc gtg gtg acc gat gat ccc tca aat gac ggt gtc ctg 2352 Gly IleGly Ala Val Val Thr Asp Asp Pro Ser Asn Asp Gly Val Leu 770 775 780 cccgcc ccc gtc gac ttc tga 2373 Pro Ala Pro Val Asp Phe 785 790 20 790 PRTMycobacterium tuberculosis 20 Met Thr Gln Thr Pro Asp Arg Glu Lys AlaLeu Glu Leu Ala Val Ala 1 5 10 15 Gln Ile Glu Lys Ser Tyr Gly Lys GlySer Val Met Arg Leu Gly Asp 20 25 30 Glu Ala Arg Gln Pro Ile Ser Val IlePro Thr Gly Ser Ile Ala Leu 35 40 45 Asp Val Ala Leu Gly Ile Gly Gly LeuPro Arg Gly Arg Val Ile Glu 50 55 60 Ile Tyr Gly Pro Glu Ser Ser Gly LysThr Thr Val Ala Leu His Ala 65 70 75 80 Val Ala Asn Ala Gln Ala Ala GlyGly Val Ala Ala Phe Ile Asp Ala 85 90 95 Glu His Ala Leu Asp Pro Asp TyrAla Lys Lys Leu Gly Val Asp Thr 100 105 110 Asp Ser Leu Leu Val Ser GlnPro Asp Thr Gly Glu Gln Ala Leu Glu 115 120 125 Ile Ala Asp Met Leu IleArg Ser Gly Ala Leu Asp Ile Val Val Ile 130 135 140 Asp Ser Val Ala AlaLeu Val Pro Arg Ala Glu Leu Glu Gly Glu Met 145 150 155 160 Gly Asp SerHis Val Gly Leu Gln Ala Arg Leu Met Ser Gln Ala Leu 165 170 175 Arg LysMet Thr Gly Ala Leu Asn Asn Ser Gly Thr Thr Ala Ile Phe 180 185 190 IleAsn Gln Leu Arg Asp Lys Ile Gly Val Met Phe Gly Ser Pro Glu 195 200 205Thr Thr Thr Gly Gly Lys Ala Leu Lys Phe Tyr Ala Ser Val Arg Met 210 215220 Asp Val Arg Arg Val Glu Thr Leu Lys Asp Gly Thr Asn Ala Val Gly 225230 235 240 Asn Arg Thr Arg Val Lys Val Val Lys Asn Lys Cys Leu Ala GluGly 245 250 255 Thr Arg Ile Phe Asp Pro Val Thr Gly Thr Thr His Arg IleGlu Asp 260 265 270 Val Val Asp Gly Arg Lys Pro Ile His Val Val Ala AlaAla Lys Asp 275 280 285 Gly Thr Leu His Ala Arg Pro Val Val Ser Trp PheAsp Gln Gly Thr 290 295 300 Arg Asp Val Ile Gly Leu Arg Ile Ala Gly GlyAla Ile Val Trp Ala 305 310 315 320 Thr Pro Asp His Lys Val Leu Thr GluTyr Gly Trp Arg Ala Ala Gly 325 330 335 Glu Leu Arg Lys Gly Asp Arg ValAla Gln Pro Arg Arg Phe Asp Gly 340 345 350 Phe Gly Asp Ser Ala Pro IlePro Ala Asp His Ala Arg Leu Leu Gly 355 360 365 Tyr Leu Ile Gly Asp GlyArg Asp Gly Trp Val Gly Gly Lys Thr Pro 370 375 380 Ile Asn Phe Ile AsnVal Gln Arg Ala Leu Ile Asp Asp Val Thr Arg 385 390 395 400 Ile Ala AlaThr Leu Gly Cys Ala Ala His Pro Gln Gly Arg Ile Ser 405 410 415 Leu AlaIle Ala His Arg Pro Gly Glu Arg Asn Gly Val Ala Asp Leu 420 425 430 CysGln Gln Ala Gly Ile Tyr Gly Lys Leu Ala Trp Glu Lys Thr Ile 435 440 445Pro Asn Trp Phe Phe Glu Pro Asp Ile Ala Ala Asp Ile Val Gly Asn 450 455460 Leu Leu Phe Gly Leu Phe Glu Ser Asp Gly Trp Val Ser Arg Glu Gln 465470 475 480 Thr Gly Ala Leu Arg Val Gly Tyr Thr Thr Thr Ser Glu Gln LeuAla 485 490 495 His Gln Ile His Trp Leu Leu Leu Arg Phe Gly Val Gly SerThr Val 500 505 510 Arg Asp Tyr Asp Pro Thr Gln Lys Arg Pro Ser Ile ValAsn Gly Arg 515 520 525 Arg Ile Gln Ser Lys Arg Gln Val Phe Glu Val ArgIle Ser Gly Met 530 535 540 Asp Asn Val Thr Ala Phe Ala Glu Ser Val ProMet Trp Gly Pro Arg 545 550 555 560 Gly Ala Ala Leu Ile Gln Ala Ile ProGlu Ala Thr Gln Gly Arg Arg 565 570 575 Arg Gly Ser Gln Ala Thr Tyr LeuAla Ala Glu Met Thr Asp Ala Val 580 585 590 Leu Asn Tyr Leu Asp Glu ArgGly Val Thr Ala Gln Glu Ala Ala Ala 595 600 605 Met Ile Gly Val Ala SerGly Asp Pro Arg Gly Gly Met Lys Gln Val 610 615 620 Leu Gly Ala Ser ArgLeu Arg Arg Asp Arg Val Gln Ala Leu Ala Asp 625 630 635 640 Ala Leu AspAsp Lys Phe Leu His Asp Met Leu Ala Glu Glu Leu Arg 645 650 655 Tyr SerVal Ile Arg Glu Val Leu Pro Thr Arg Arg Ala Arg Thr Phe 660 665 670 AspLeu Glu Val Glu Glu Leu His Thr Leu Val Ala Glu Gly Val Val 675 680 685Val His Asn Cys Ser Pro Pro Phe Lys Gln Ala Glu Phe Asp Ile Leu 690 695700 Tyr Gly Lys Gly Ile Ser Arg Glu Gly Ser Leu Ile Asp Met Gly Val 705710 715 720 Asp Gln Gly Leu Ile Arg Lys Ser Gly Ala Trp Phe Thr Tyr GluGly 725 730 735 Glu Gln Leu Gly Gln Gly Lys Glu Asn Ala Arg Asn Phe LeuVal Glu 740 745 750 Asn Ala Asp Val Ala Asp Glu Ile Glu Lys Lys Ile LysGlu Lys Leu 755 760 765 Gly Ile Gly Ala Val Val Thr Asp Asp Pro Ser AsnAsp Gly Val Leu 770 775 780 Pro Ala Pro Val Asp Phe 785 790 21 4047 DNASaccharomyces cerevisiae CDS (6)..(4037) 21 gaaag atg aag cta ctg tcttct atc gaa caa gca tgc gat att tgc cga 50 Met Lys Leu Leu Ser Ser IleGlu Gln Ala Cys Asp Ile Cys Arg 1 5 10 15 ctt aaa aag ctt aaa tgc tttgcc aag gga acg aat gtt tta atg gcg 98 Leu Lys Lys Leu Lys Cys Phe AlaLys Gly Thr Asn Val Leu Met Ala 20 25 30 gat ggg tct att gaa tgt att gaaaac att gag gtt ggt aat aag gtc 146 Asp Gly Ser Ile Glu Cys Ile Glu AsnIle Glu Val Gly Asn Lys Val 35 40 45 atg ggt aaa gat ggc aga cct cgt gaggta att aaa ttg ccc aga gga 194 Met Gly Lys Asp Gly Arg Pro Arg Glu ValIle Lys Leu Pro Arg Gly 50 55 60 aga gaa act atg tac agc gtc gtg cag aaaagt cag cac aga gcc cac 242 Arg Glu Thr Met Tyr Ser Val Val Gln Lys SerGln His Arg Ala His 65 70 75 aaa agt gac tca agt cgt gaa gtg cca gaa ttactc aag ttt acg tgt 290 Lys Ser Asp Ser Ser Arg Glu Val Pro Glu Leu LeuLys Phe Thr Cys 80 85 90 95 aat gcg acc cat gag ttg gtt gtt aga aca cctcgt agt gtc cgc cgt 338 Asn Ala Thr His Glu Leu Val Val Arg Thr Pro ArgSer Val Arg Arg 100 105 110 ttg tct cgt acc att aag ggt gtc gaa tat tttgaa gtt att act ttt 386 Leu Ser Arg Thr Ile Lys Gly Val Glu Tyr Phe GluVal Ile Thr Phe 115 120 125 gag atg ggc caa aag aaa gcc ccc gac ggt agaatt gtt gag ctt gtc 434 Glu Met Gly Gln Lys Lys Ala Pro Asp Gly Arg IleVal Glu Leu Val 130 135 140 aag gaa gtt tca aag agc tac cca ata tct gagggg cct gag aga gcc 482 Lys Glu Val Ser Lys Ser Tyr Pro Ile Ser Glu GlyPro Glu Arg Ala 145 150 155 aac gaa tta gta gaa tcc tat aga aag gct tcaaat aaa gcc tat ttt 530 Asn Glu Leu Val Glu Ser Tyr Arg Lys Ala Ser AsnLys Ala Tyr Phe 160 165 170 175 gag tgg act att gag gcc aga gat ctt tctctg ttg ggt tcc cat gtt 578 Glu Trp Thr Ile Glu Ala Arg Asp Leu Ser LeuLeu Gly Ser His Val 180 185 190 cgt aaa gct acc tac cag act tac gct ccaatt ctt tat gag aat gac 626 Arg Lys Ala Thr Tyr Gln Thr Tyr Ala Pro IleLeu Tyr Glu Asn Asp 195 200 205 cac ttt ttc gac tac atg caa aaa agt aagttt cat ctc acc att gaa 674 His Phe Phe Asp Tyr Met Gln Lys Ser Lys PheHis Leu Thr Ile Glu 210 215 220 ggt cca aaa gta ctt gct tat tta ctt ggttta tgg att ggt gat gga 722 Gly Pro Lys Val Leu Ala Tyr Leu Leu Gly LeuTrp Ile Gly Asp Gly 225 230 235 ttg tct gac agg gca act ttt tcg gtt gattcc aga gat act tct ttg 770 Leu Ser Asp Arg Ala Thr Phe Ser Val Asp SerArg Asp Thr Ser Leu 240 245 250 255 atg gaa cgt gtt act gaa tat gct gaaaag ttg aat ttg tgc gcc gag 818 Met Glu Arg Val Thr Glu Tyr Ala Glu LysLeu Asn Leu Cys Ala Glu 260 265 270 tat aag gac aga aaa gaa cca caa gttgcc aaa act gtt aat ttg tac 866 Tyr Lys Asp Arg Lys Glu Pro Gln Val AlaLys Thr Val Asn Leu Tyr 275 280 285 tct aaa gtt gtc aga ggt aat ggt attcgc aat aat ctt aat act gag 914 Ser Lys Val Val Arg Gly Asn Gly Ile ArgAsn Asn Leu Asn Thr Glu 290 295 300 aat cca tta tgg gac gct att gtt ggctta gga ttc ttg aag gac ggt 962 Asn Pro Leu Trp Asp Ala Ile Val Gly LeuGly Phe Leu Lys Asp Gly 305 310 315 gtc aaa aat att cct tct ttc ttg tctacg gac aat atc ggt act cgt 1010 Val Lys Asn Ile Pro Ser Phe Leu Ser ThrAsp Asn Ile Gly Thr Arg 320 325 330 335 gaa aca ttt ctt gct ggt cta attgat tct gat ggc tat gtt act gat 1058 Glu Thr Phe Leu Ala Gly Leu Ile AspSer Asp Gly Tyr Val Thr Asp 340 345 350 gag cat ggt att aaa gca aca ataaag aca att cat act tct gtc aga 1106 Glu His Gly Ile Lys Ala Thr Ile LysThr Ile His Thr Ser Val Arg 355 360 365 gat ggt ttg gtt tcc ctt gct cgttct tta ggc tta gta gtc tcg gtt 1154 Asp Gly Leu Val Ser Leu Ala Arg SerLeu Gly Leu Val Val Ser Val 370 375 380 aac gca gaa cct gct aag gtt gacatg aat ggc acc aaa cat aaa att 1202 Asn Ala Glu Pro Ala Lys Val Asp MetAsn Gly Thr Lys His Lys Ile 385 390 395 agt tat gct att tat atg tct ggtgga gat gtt ttg ctt aac gtt ctt 1250 Ser Tyr Ala Ile Tyr Met Ser Gly GlyAsp Val Leu Leu Asn Val Leu 400 405 410 415 tcg aag tgt gcc ggc tct aaaaaa ttc agg cct gct ccc gcc gct gct 1298 Ser Lys Cys Ala Gly Ser Lys LysPhe Arg Pro Ala Pro Ala Ala Ala 420 425 430 ttt gca cgt gag tgc cgc ggattt tat ttc gag tta caa gaa ttg aag 1346 Phe Ala Arg Glu Cys Arg Gly PheTyr Phe Glu Leu Gln Glu Leu Lys 435 440 445 gaa gac gat tat tat ggg attact tta tct gat gat tct gat cat cag 1394 Glu Asp Asp Tyr Tyr Gly Ile ThrLeu Ser Asp Asp Ser Asp His Gln 450 455 460 ttt ttg ctt gcc aac cag gttgtc gtc cat aat tgc tcc aaa gaa aaa 1442 Phe Leu Leu Ala Asn Gln Val ValVal His Asn Cys Ser Lys Glu Lys 465 470 475 ccg aag tgc gcc aag tgt cttaag aac aac tgg gag tgt cgc tac tct 1490 Pro Lys Cys Ala Lys Cys Leu LysAsn Asn Trp Glu Cys Arg Tyr Ser 480 485 490 495 ccc aaa acc aaa agg tctccg ctg act aga gct cat ctg aca gaa gtg 1538 Pro Lys Thr Lys Arg Ser ProLeu Thr Arg Ala His Leu Thr Glu Val 500 505 510 gaa tca agg cta gaa agactg gaa cag cta ttt cta ctg att ttt cct 1586 Glu Ser Arg Leu Glu Arg LeuGlu Gln Leu Phe Leu Leu Ile Phe Pro 515 520 525 cga gaa gac ctt gac atgatt ttg aaa atg gat tct tta cag gat ata 1634 Arg Glu Asp Leu Asp Met IleLeu Lys Met Asp Ser Leu Gln Asp Ile 530 535 540 aaa gca ttg tta aca ggatta ttt gta caa gat aat gtg aat aaa gat 1682 Lys Ala Leu Leu Thr Gly LeuPhe Val Gln Asp Asn Val Asn Lys Asp 545 550 555 gcc gtc aca gat aga ttggct tca gtg gag act gat atg cct cta aca 1730 Ala Val Thr Asp Arg Leu AlaSer Val Glu Thr Asp Met Pro Leu Thr 560 565 570 575 ttg aga cag cat agaata agt gcg aca tca tca tcg gaa gag agt agt 1778 Leu Arg Gln His Arg IleSer Ala Thr Ser Ser Ser Glu Glu Ser Ser 580 585 590 aac aaa ggt caa agacag ttg act gta tcg att gac tcg gca gct cat 1826 Asn Lys Gly Gln Arg GlnLeu Thr Val Ser Ile Asp Ser Ala Ala His 595 600 605 cat gat aac tcc acaatt ccg ttg gat ttt atg ccc agg gat gct ctt 1874 His Asp Asn Ser Thr IlePro Leu Asp Phe Met Pro Arg Asp Ala Leu 610 615 620 cat gga ttt gat tggtct gaa gag gat gac atg tcg gat ggc ttg ccc 1922 His Gly Phe Asp Trp SerGlu Glu Asp Asp Met Ser Asp Gly Leu Pro 625 630 635 ttc ctg aaa acg gacccc aac aat aat ggg ttc ttt ggc gac ggt tct 1970 Phe Leu Lys Thr Asp ProAsn Asn Asn Gly Phe Phe Gly Asp Gly Ser 640 645 650 655 ctc tta tgt attctt cga tct att ggc ttt aaa ccg gaa aat tac acg 2018 Leu Leu Cys Ile LeuArg Ser Ile Gly Phe Lys Pro Glu Asn Tyr Thr 660 665 670 aac tct aac gttaac agg ctc ccg acc atg att acg gat aga tac acg 2066 Asn Ser Asn Val AsnArg Leu Pro Thr Met Ile Thr Asp Arg Tyr Thr 675 680 685 ttg gct tct agatcc aca aca tcc cgt tta ctt caa agt tat ctc aat 2114 Leu Ala Ser Arg SerThr Thr Ser Arg Leu Leu Gln Ser Tyr Leu Asn 690 695 700 aat ttt cac ccctac tgc cct atc gtg cac tca ccg acg cta atg atg 2162 Asn Phe His Pro TyrCys Pro Ile Val His Ser Pro Thr Leu Met Met 705 710 715 ttg tat aat aaccag att gaa atc gcg tcg aag gat caa tgg caa atc 2210 Leu Tyr Asn Asn GlnIle Glu Ile Ala Ser Lys Asp Gln Trp Gln Ile 720 725 730 735 ctt ttt aactgc ata tta gcc att gga gcc tgg tgt ata gag ggg gaa 2258 Leu Phe Asn CysIle Leu Ala Ile Gly Ala Trp Cys Ile Glu Gly Glu 740 745 750 tct act gatata gat gtt ttt tac tat caa aat gct aaa tct cat ttg 2306 Ser Thr Asp IleAsp Val Phe Tyr Tyr Gln Asn Ala Lys Ser His Leu 755 760 765 acg agc aaggtc ttc gag tca ggt tcc ata att ttg gtg aca gcc cta 2354 Thr Ser Lys ValPhe Glu Ser Gly Ser Ile Ile Leu Val Thr Ala Leu 770 775 780 cat ctt ctgtcg cga tat aca cag tgg agg cag aaa aca aat act agc 2402 His Leu Leu SerArg Tyr Thr Gln Trp Arg Gln Lys Thr Asn Thr Ser 785 790 795 tat aat tttcac agc ttt tcc ata aga atg gcc ata tca ttg ggc ttg 2450 Tyr Asn Phe HisSer Phe Ser Ile Arg Met Ala Ile Ser Leu Gly Leu 800 805 810 815 aat agggac ctc ccc tcg tcc ttc agt gat agc agc att ctg gaa caa 2498 Asn Arg AspLeu Pro Ser Ser Phe Ser Asp Ser Ser Ile Leu Glu Gln 820 825 830 aga cgccga att tgg tgg tct gtc tac tct tgg gag atc caa ttg tcc 2546 Arg Arg ArgIle Trp Trp Ser Val Tyr Ser Trp Glu Ile Gln Leu Ser 835 840 845 ctg ctttat ggt cga tcc atc cag ctt tct cag aat aca atc tcc ttc 2594 Leu Leu TyrGly Arg Ser Ile Gln Leu Ser Gln Asn Thr Ile Ser Phe 850 855 860 cct tcttct gtc gac gat gtg cag cgt acc aca aca ggt ccc acc ata 2642 Pro Ser SerVal Asp Asp Val Gln Arg Thr Thr Thr Gly Pro Thr Ile 865 870 875 tat catggc atc att gaa aca gca agg ctc tta caa gtt ttc aca aaa 2690 Tyr His GlyIle Ile Glu Thr Ala Arg Leu Leu Gln Val Phe Thr Lys 880 885 890 895 atctat gaa cta gac aaa aca gta act gca gaa aaa agt cct ata tgt 2738 Ile TyrGlu Leu Asp Lys Thr Val Thr Ala Glu Lys Ser Pro Ile Cys 900 905 910 gcaaaa aaa tgc ttg atg att tgt aat gag att gag gag gtt tcg aga 2786 Ala LysLys Cys Leu Met Ile Cys Asn Glu Ile Glu Glu Val Ser Arg 915 920 925 caggca cca aag ttt tta caa atg gat att tcc acc acc gct cta acc 2834 Gln AlaPro Lys Phe Leu Gln Met Asp Ile Ser Thr Thr Ala Leu Thr 930 935 940 aatttg ttg aag gaa cac cct tgg cta tcc ttt aca aga ttc gaa ctg 2882 Asn LeuLeu Lys Glu His Pro Trp Leu Ser Phe Thr Arg Phe Glu Leu 945 950 955 aagtgg aaa cag ttg tct ctt atc att tat gta tta aga gat ttt ttc 2930 Lys TrpLys Gln Leu Ser Leu Ile Ile Tyr Val Leu Arg Asp Phe Phe 960 965 970 975act aat ttt acc cag aaa aag tca caa cta gaa cag gat caa aat gat 2978 ThrAsn Phe Thr Gln Lys Lys Ser Gln Leu Glu Gln Asp Gln Asn Asp 980 985 990cat caa agt tat gaa gtt aaa cga tgc tcc atc atg tta agc gat gca 3026 HisGln Ser Tyr Glu Val Lys Arg Cys Ser Ile Met Leu Ser Asp Ala 995 10001005 gca caa aga act gtt atg tct gta agt agc tat atg gac aat cat aat3074 Ala Gln Arg Thr Val Met Ser Val Ser Ser Tyr Met Asp Asn His Asn1010 1015 1020 gtc acc cca tat ttt gcc tgg aat tgt tct tat tac ttg ttcaat gca 3122 Val Thr Pro Tyr Phe Ala Trp Asn Cys Ser Tyr Tyr Leu Phe AsnAla 1025 1030 1035 gtc cta gta ccc ata aag act cta ctc tca aac tca aaatcg aat gct 3170 Val Leu Val Pro Ile Lys Thr Leu Leu Ser Asn Ser Lys SerAsn Ala 1040 1045 1050 1055 gag aat aac gag acc gca caa tta tta caa caaatt aac act gtt ctg 3218 Glu Asn Asn Glu Thr Ala Gln Leu Leu Gln Gln IleAsn Thr Val Leu 1060 1065 1070 atg cta tta aaa aaa ctg gcc act ttt aaaatc cag act tgt gaa aaa 3266 Met Leu Leu Lys Lys Leu Ala Thr Phe Lys IleGln Thr Cys Glu Lys 1075 1080 1085 tac att caa gta ctg gaa gag gta tgtgcg ccg ttt ctg tta tca cag 3314 Tyr Ile Gln Val Leu Glu Glu Val Cys AlaPro Phe Leu Leu Ser Gln 1090 1095 1100 tgt gca atc cca tta ccg cat atcagt tat aac aat agt aat ggt agc 3362 Cys Ala Ile Pro Leu Pro His Ile SerTyr Asn Asn Ser Asn Gly Ser 1105 1110 1115 gcc att aaa aat att gtc ggttct gca act atc gcc caa tac cct act 3410 Ala Ile Lys Asn Ile Val Gly SerAla Thr Ile Ala Gln Tyr Pro Thr 1120 1125 1130 1135 ctt ccg gag gaa aatgtc aac aat atc agt gtt aaa tat gtt tct cct 3458 Leu Pro Glu Glu Asn ValAsn Asn Ile Ser Val Lys Tyr Val Ser Pro 1140 1145 1150 ggc tca gta gggcct tca cct gtg cca ttg aaa tca gga gca agt ttc 3506 Gly Ser Val Gly ProSer Pro Val Pro Leu Lys Ser Gly Ala Ser Phe 1155 1160 1165 agt gat ctagtc aag ctg tta tct aac cgt cca ccc tct cgt aac tct 3554 Ser Asp Leu ValLys Leu Leu Ser Asn Arg Pro Pro Ser Arg Asn Ser 1170 1175 1180 cca gtgaca ata cca aga agc aca cct tcg cat cgc tca gtc acg cct 3602 Pro Val ThrIle Pro Arg Ser Thr Pro Ser His Arg Ser Val Thr Pro 1185 1190 1195 tttcta ggg caa cag caa cag ctg caa tca tta gtg cca ctg acc ccg 3650 Phe LeuGly Gln Gln Gln Gln Leu Gln Ser Leu Val Pro Leu Thr Pro 1200 1205 12101215 tct gct ttg ttt ggt ggc gcc aat ttt aat caa agt ggg aat att gct3698 Ser Ala Leu Phe Gly Gly Ala Asn Phe Asn Gln Ser Gly Asn Ile Ala1220 1225 1230 gat agc tca ttg tcc ttc act ttc act aac agt agc aac ggtccg aac 3746 Asp Ser Ser Leu Ser Phe Thr Phe Thr Asn Ser Ser Asn Gly ProAsn 1235 1240 1245 ctc ata aca act caa aca aat tct caa gcg ctt tca caacca att gcc 3794 Leu Ile Thr Thr Gln Thr Asn Ser Gln Ala Leu Ser Gln ProIle Ala 1250 1255 1260 tcc tct aac gtt cat gat aac ttc atg aat aat gaaatc acg gct agt 3842 Ser Ser Asn Val His Asp Asn Phe Met Asn Asn Glu IleThr Ala Ser 1265 1270 1275 aaa att gat gat ggt aat aat tca aaa cca ctgtca cct ggt tgg acg 3890 Lys Ile Asp Asp Gly Asn Asn Ser Lys Pro Leu SerPro Gly Trp Thr 1280 1285 1290 1295 gac caa act gcg tat aac gcg ttt ggaatc act aca ggg atg ttt aat 3938 Asp Gln Thr Ala Tyr Asn Ala Phe Gly IleThr Thr Gly Met Phe Asn 1300 1305 1310 acc act aca atg gat gat gta tataac tat cta ttc gat gat gaa gat 3986 Thr Thr Thr Met Asp Asp Val Tyr AsnTyr Leu Phe Asp Asp Glu Asp 1315 1320 1325 acc cca cca aac cca aaa aaagag cag aag ctg atc tcc gag gag gat 4034 Thr Pro Pro Asn Pro Lys Lys GluGln Lys Leu Ile Ser Glu Glu Asp 1330 1335 1340 ctg taggtacccc 4047 Leu22 1344 PRT Saccharomyces cerevisiae 22 Met Lys Leu Leu Ser Ser Ile GluGln Ala Cys Asp Ile Cys Arg Leu 1 5 10 15 Lys Lys Leu Lys Cys Phe AlaLys Gly Thr Asn Val Leu Met Ala Asp 20 25 30 Gly Ser Ile Glu Cys Ile GluAsn Ile Glu Val Gly Asn Lys Val Met 35 40 45 Gly Lys Asp Gly Arg Pro ArgGlu Val Ile Lys Leu Pro Arg Gly Arg 50 55 60 Glu Thr Met Tyr Ser Val ValGln Lys Ser Gln His Arg Ala His Lys 65 70 75 80 Ser Asp Ser Ser Arg GluVal Pro Glu Leu Leu Lys Phe Thr Cys Asn 85 90 95 Ala Thr His Glu Leu ValVal Arg Thr Pro Arg Ser Val Arg Arg Leu 100 105 110 Ser Arg Thr Ile LysGly Val Glu Tyr Phe Glu Val Ile Thr Phe Glu 115 120 125 Met Gly Gln LysLys Ala Pro Asp Gly Arg Ile Val Glu Leu Val Lys 130 135 140 Glu Val SerLys Ser Tyr Pro Ile Ser Glu Gly Pro Glu Arg Ala Asn 145 150 155 160 GluLeu Val Glu Ser Tyr Arg Lys Ala Ser Asn Lys Ala Tyr Phe Glu 165 170 175Trp Thr Ile Glu Ala Arg Asp Leu Ser Leu Leu Gly Ser His Val Arg 180 185190 Lys Ala Thr Tyr Gln Thr Tyr Ala Pro Ile Leu Tyr Glu Asn Asp His 195200 205 Phe Phe Asp Tyr Met Gln Lys Ser Lys Phe His Leu Thr Ile Glu Gly210 215 220 Pro Lys Val Leu Ala Tyr Leu Leu Gly Leu Trp Ile Gly Asp GlyLeu 225 230 235 240 Ser Asp Arg Ala Thr Phe Ser Val Asp Ser Arg Asp ThrSer Leu Met 245 250 255 Glu Arg Val Thr Glu Tyr Ala Glu Lys Leu Asn LeuCys Ala Glu Tyr 260 265 270 Lys Asp Arg Lys Glu Pro Gln Val Ala Lys ThrVal Asn Leu Tyr Ser 275 280 285 Lys Val Val Arg Gly Asn Gly Ile Arg AsnAsn Leu Asn Thr Glu Asn 290 295 300 Pro Leu Trp Asp Ala Ile Val Gly LeuGly Phe Leu Lys Asp Gly Val 305 310 315 320 Lys Asn Ile Pro Ser Phe LeuSer Thr Asp Asn Ile Gly Thr Arg Glu 325 330 335 Thr Phe Leu Ala Gly LeuIle Asp Ser Asp Gly Tyr Val Thr Asp Glu 340 345 350 His Gly Ile Lys AlaThr Ile Lys Thr Ile His Thr Ser Val Arg Asp 355 360 365 Gly Leu Val SerLeu Ala Arg Ser Leu Gly Leu Val Val Ser Val Asn 370 375 380 Ala Glu ProAla Lys Val Asp Met Asn Gly Thr Lys His Lys Ile Ser 385 390 395 400 TyrAla Ile Tyr Met Ser Gly Gly Asp Val Leu Leu Asn Val Leu Ser 405 410 415Lys Cys Ala Gly Ser Lys Lys Phe Arg Pro Ala Pro Ala Ala Ala Phe 420 425430 Ala Arg Glu Cys Arg Gly Phe Tyr Phe Glu Leu Gln Glu Leu Lys Glu 435440 445 Asp Asp Tyr Tyr Gly Ile Thr Leu Ser Asp Asp Ser Asp His Gln Phe450 455 460 Leu Leu Ala Asn Gln Val Val Val His Asn Cys Ser Lys Glu LysPro 465 470 475 480 Lys Cys Ala Lys Cys Leu Lys Asn Asn Trp Glu Cys ArgTyr Ser Pro 485 490 495 Lys Thr Lys Arg Ser Pro Leu Thr Arg Ala His LeuThr Glu Val Glu 500 505 510 Ser Arg Leu Glu Arg Leu Glu Gln Leu Phe LeuLeu Ile Phe Pro Arg 515 520 525 Glu Asp Leu Asp Met Ile Leu Lys Met AspSer Leu Gln Asp Ile Lys 530 535 540 Ala Leu Leu Thr Gly Leu Phe Val GlnAsp Asn Val Asn Lys Asp Ala 545 550 555 560 Val Thr Asp Arg Leu Ala SerVal Glu Thr Asp Met Pro Leu Thr Leu 565 570 575 Arg Gln His Arg Ile SerAla Thr Ser Ser Ser Glu Glu Ser Ser Asn 580 585 590 Lys Gly Gln Arg GlnLeu Thr Val Ser Ile Asp Ser Ala Ala His His 595 600 605 Asp Asn Ser ThrIle Pro Leu Asp Phe Met Pro Arg Asp Ala Leu His 610 615 620 Gly Phe AspTrp Ser Glu Glu Asp Asp Met Ser Asp Gly Leu Pro Phe 625 630 635 640 LeuLys Thr Asp Pro Asn Asn Asn Gly Phe Phe Gly Asp Gly Ser Leu 645 650 655Leu Cys Ile Leu Arg Ser Ile Gly Phe Lys Pro Glu Asn Tyr Thr Asn 660 665670 Ser Asn Val Asn Arg Leu Pro Thr Met Ile Thr Asp Arg Tyr Thr Leu 675680 685 Ala Ser Arg Ser Thr Thr Ser Arg Leu Leu Gln Ser Tyr Leu Asn Asn690 695 700 Phe His Pro Tyr Cys Pro Ile Val His Ser Pro Thr Leu Met MetLeu 705 710 715 720 Tyr Asn Asn Gln Ile Glu Ile Ala Ser Lys Asp Gln TrpGln Ile Leu 725 730 735 Phe Asn Cys Ile Leu Ala Ile Gly Ala Trp Cys IleGlu Gly Glu Ser 740 745 750 Thr Asp Ile Asp Val Phe Tyr Tyr Gln Asn AlaLys Ser His Leu Thr 755 760 765 Ser Lys Val Phe Glu Ser Gly Ser Ile IleLeu Val Thr Ala Leu His 770 775 780 Leu Leu Ser Arg Tyr Thr Gln Trp ArgGln Lys Thr Asn Thr Ser Tyr 785 790 795 800 Asn Phe His Ser Phe Ser IleArg Met Ala Ile Ser Leu Gly Leu Asn 805 810 815 Arg Asp Leu Pro Ser SerPhe Ser Asp Ser Ser Ile Leu Glu Gln Arg 820 825 830 Arg Arg Ile Trp TrpSer Val Tyr Ser Trp Glu Ile Gln Leu Ser Leu 835 840 845 Leu Tyr Gly ArgSer Ile Gln Leu Ser Gln Asn Thr Ile Ser Phe Pro 850 855 860 Ser Ser ValAsp Asp Val Gln Arg Thr Thr Thr Gly Pro Thr Ile Tyr 865 870 875 880 HisGly Ile Ile Glu Thr Ala Arg Leu Leu Gln Val Phe Thr Lys Ile 885 890 895Tyr Glu Leu Asp Lys Thr Val Thr Ala Glu Lys Ser Pro Ile Cys Ala 900 905910 Lys Lys Cys Leu Met Ile Cys Asn Glu Ile Glu Glu Val Ser Arg Gln 915920 925 Ala Pro Lys Phe Leu Gln Met Asp Ile Ser Thr Thr Ala Leu Thr Asn930 935 940 Leu Leu Lys Glu His Pro Trp Leu Ser Phe Thr Arg Phe Glu LeuLys 945 950 955 960 Trp Lys Gln Leu Ser Leu Ile Ile Tyr Val Leu Arg AspPhe Phe Thr 965 970 975 Asn Phe Thr Gln Lys Lys Ser Gln Leu Glu Gln AspGln Asn Asp His 980 985 990 Gln Ser Tyr Glu Val Lys Arg Cys Ser Ile MetLeu Ser Asp Ala Ala 995 1000 1005 Gln Arg Thr Val Met Ser Val Ser SerTyr Met Asp Asn His Asn Val 1010 1015 1020 Thr Pro Tyr Phe Ala Trp AsnCys Ser Tyr Tyr Leu Phe Asn Ala Val 1025 1030 1035 1040 Leu Val Pro IleLys Thr Leu Leu Ser Asn Ser Lys Ser Asn Ala Glu 1045 1050 1055 Asn AsnGlu Thr Ala Gln Leu Leu Gln Gln Ile Asn Thr Val Leu Met 1060 1065 1070Leu Leu Lys Lys Leu Ala Thr Phe Lys Ile Gln Thr Cys Glu Lys Tyr 10751080 1085 Ile Gln Val Leu Glu Glu Val Cys Ala Pro Phe Leu Leu Ser GlnCys 1090 1095 1100 Ala Ile Pro Leu Pro His Ile Ser Tyr Asn Asn Ser AsnGly Ser Ala 1105 1110 1115 1120 Ile Lys Asn Ile Val Gly Ser Ala Thr IleAla Gln Tyr Pro Thr Leu 1125 1130 1135 Pro Glu Glu Asn Val Asn Asn IleSer Val Lys Tyr Val Ser Pro Gly 1140 1145 1150 Ser Val Gly Pro Ser ProVal Pro Leu Lys Ser Gly Ala Ser Phe Ser 1155 1160 1165 Asp Leu Val LysLeu Leu Ser Asn Arg Pro Pro Ser Arg Asn Ser Pro 1170 1175 1180 Val ThrIle Pro Arg Ser Thr Pro Ser His Arg Ser Val Thr Pro Phe 1185 1190 11951200 Leu Gly Gln Gln Gln Gln Leu Gln Ser Leu Val Pro Leu Thr Pro Ser1205 1210 1215 Ala Leu Phe Gly Gly Ala Asn Phe Asn Gln Ser Gly Asn IleAla Asp 1220 1225 1230 Ser Ser Leu Ser Phe Thr Phe Thr Asn Ser Ser AsnGly Pro Asn Leu 1235 1240 1245 Ile Thr Thr Gln Thr Asn Ser Gln Ala LeuSer Gln Pro Ile Ala Ser 1250 1255 1260 Ser Asn Val His Asp Asn Phe MetAsn Asn Glu Ile Thr Ala Ser Lys 1265 1270 1275 1280 Ile Asp Asp Gly AsnAsn Ser Lys Pro Leu Ser Pro Gly Trp Thr Asp 1285 1290 1295 Gln Thr AlaTyr Asn Ala Phe Gly Ile Thr Thr Gly Met Phe Asn Thr 1300 1305 1310 ThrThr Met Asp Asp Val Tyr Asn Tyr Leu Phe Asp Asp Glu Asp Thr 1315 13201325 Pro Pro Asn Pro Lys Lys Glu Gln Lys Leu Ile Ser Glu Glu Asp Leu1330 1335 1340 23 13 PRT Artificial Sequence Description of ArtificialSequence Conserved N1 domain 23 Cys Phe Ala Lys Gly Thr Asn Val Leu MetAla Asp Gly 1 5 10 24 7 PRT Artificial Sequence Description ofArtificial Sequence Conserved N2 domain 24 Ile Glu Val Gly Asn Lys Val 15 25 14 PRT Artificial Sequence Description of Artificial SequenceConserved N3 domain 25 Leu Leu Lys Phe Thr Cys Asn Ala Thr His Glu LeuVal Val 1 5 10 26 16 PRT Artificial Sequence Description of ArtificialSequence Conserved N4 domain 26 Trp Lys Leu Ile Asp Glu Ile Lys Pro GlyAsp Tyr Ala Val Leu Gln 1 5 10 15 27 9 PRT Artificial SequenceDescription of Artificial Sequence Conserved EN1 domain 27 Leu Leu GlyLeu Trp Ile Gly Asp Gly 1 5 28 8 PRT Artificial Sequence Description ofArtificial Sequence Conserved EN2 domain 28 Val Lys Asn Ile Pro Ser PheLeu 1 5 29 10 PRT Artificial Sequence Description of Artificial SequenceConserved EN3 domain 29 Phe Leu Ala Gly Leu Ile Asp Ser Asp Gly 1 5 1030 19 PRT Artificial Sequence Description of Artificial SequenceConserved EN4 domain 30 Thr Ile His Thr Ser Val Arg Asp Gly Leu Val SerLeu Ala Arg Ser 1 5 10 15 Leu Gly Leu 31 8 PRT Artificial SequenceDescription of Artificial Sequence Conserved C1 domain 31 Asn Gln ValVal Val His Asn Cys 1 5 32 14 PRT Artificial Sequence Description ofArtificial Sequence Conserved C2 domain 32 Tyr Gly Ile Thr Leu Ser AspAsp Ser Asp His Gln Phe Leu 1 5 10 33 13 PRT Artificial SequenceDescription of Artificial Sequence Conserved N1 domain 33 Cys Xaa XaaXaa Asp Xaa Xaa Xaa Xaa Xaa Xaa Xaa Gly 1 5 10 34 7 PRT ArtificialSequence Description of Artificial Sequence Conserved N2 domain 34 XaaXaa Xaa Gly Xaa Xaa Val 1 5 35 14 PRT Artificial Sequence Description ofArtificial Sequence Conserved N3 domain 35 Gly Xaa Xaa Xaa Xaa Xaa ThrXaa Xaa His Xaa Xaa Xaa Xaa 1 5 10 36 16 PRT Artificial SequenceDescription of Artificial Sequence Conserved N4 domain 36 Trp Xaa XaaXaa Xaa Xaa Xaa Xaa Xaa Xaa Asp Xaa Xaa Xaa Xaa Xaa 1 5 10 15 37 9 PRTArtificial Sequence Description of Artificial Sequence Conserved EN1domain 37 Leu Xaa Gly Xaa Xaa Xaa Xaa Xaa Gly 1 5 38 8 PRT ArtificialSequence Description of Artificial Sequence Conserved EN2 domain 38 XaaLys Xaa Ile Pro Xaa Xaa Xaa 1 5 39 10 PRT Artificial SequenceDescription of Artificial Sequence Conserved EN3 domain 39 Xaa Leu XaaGly Xaa Phe Xaa Xaa Asp Gly 1 5 10 40 19 PRT Artificial SequenceDescription of Artificial Sequence Conserved EN4 domain 40 Xaa Xaa SerXaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Leu Leu Xaa Xaa 1 5 10 15 Xaa GlyIle 41 14 PRT Artificial Sequence Description of Artificial SequenceConserved C1 domain 41 Xaa Val Tyr Asp Leu Xaa Val Xaa Xaa Xaa Xaa XaaPhe Xaa 1 5 10 42 8 PRT Artificial Sequence Description of ArtificialSequence Conserved C2 domain 42 Asn Gly Xaa Xaa Xaa His Asn Xaa 1 5 43454 PRT Artificial Sequence Description of Artificial Sequence SyntheticVMA allele mutation 43 Cys Phe Ala Lys Gly Thr Asn Val Leu Met Ala AspGly Ser Ile Glu 1 5 10 15 Cys Ile Glu Asn Ile Glu Val Gly Asn Lys ValMet Gly Lys Asp Gly 20 25 30 Arg Pro Arg Glu Val Ile Lys Leu Pro Arg GlyArg Glu Thr Met Tyr 35 40 45 Ser Val Val Gln Lys Ser Gln His Arg Ala HisLys Ser Asp Ser Ser 50 55 60 Arg Glu Val Pro Glu Leu Leu Lys Phe Thr CysAsn Ala Thr His Glu 65 70 75 80 Leu Val Val Arg Thr Pro Arg Ser Val ArgArg Leu Ser Arg Thr Ile 85 90 95 Lys Gly Val Glu Tyr Phe Glu Val Ile ThrPhe Glu Met Gly Gln Lys 100 105 110 Lys Ala Pro Asp Gly Arg Ile Val GluLeu Val Lys Glu Val Ser Lys 115 120 125 Ser Tyr Pro Ile Ser Glu Gly ProGlu Arg Ala Asn Glu Leu Val Glu 130 135 140 Ser Tyr Arg Lys Ala Ser AsnLys Ala Tyr Phe Glu Trp Thr Ile Glu 145 150 155 160 Ala Arg Asp Leu SerLeu Leu Gly Ser His Val Arg Lys Ala Thr Tyr 165 170 175 Gln Thr Tyr AlaPro Ile Leu Tyr Glu Asn Asp His Phe Phe Asp Tyr 180 185 190 Met Gln LysSer Lys Phe His Leu Thr Ile Glu Gly Pro Lys Val Leu 195 200 205 Ala TyrLeu Leu Gly Leu Trp Ile Gly Asp Gly Leu Ser Asp Arg Ala 210 215 220 ThrPhe Ser Val Asp Ser Arg Asp Thr Ser Leu Met Glu Arg Val Thr 225 230 235240 Glu Tyr Ala Glu Lys Leu Asn Leu Cys Ala Glu Tyr Lys Asp Arg Lys 245250 255 Glu Pro Gln Val Ala Lys Thr Val Asn Leu Tyr Ser Lys Val Val Arg260 265 270 Gly Asn Gly Ile Arg Asn Asn Leu Asn Thr Glu Asn Pro Leu TrpAsp 275 280 285 Ala Ile Val Gly Leu Gly Phe Leu Lys Asp Gly Val Lys AsnIle Pro 290 295 300 Ser Phe Leu Ser Thr Asp Asn Ile Gly Thr Arg Glu ThrPhe Leu Ala 305 310 315 320 Gly Leu Ile Asp Ser Asp Gly Tyr Val Thr AspGlu His Gly Ile Lys 325 330 335 Ala Thr Ile Lys Thr Ile His Thr Ser ValArg Asp Gly Leu Val Ser 340 345 350 Leu Ala Arg Ser Leu Gly Leu Val ValSer Val Asn Ala Glu Pro Ala 355 360 365 Lys Val Asp Met Asn Gly Thr LysHis Lys Ile Ser Tyr Ala Ile Tyr 370 375 380 Met Ser Gly Gly Asp Val LeuLeu Asn Val Leu Ser Lys Cys Ala Gly 385 390 395 400 Ser Lys Lys Phe ArgPro Ala Pro Ala Ala Ala Phe Ala Arg Glu Cys 405 410 415 Arg Gly Phe TyrPhe Glu Leu Gln Glu Leu Lys Glu Asp Asp Tyr Tyr 420 425 430 Gly Ile ThrLeu Ser Asp Asp Ser Asp His Gln Phe Leu Leu Ala Asn 435 440 445 Gln ValLys Ala His Asn 450 44 15 PRT Artificial Sequence Description ofArtificial Sequence Synthetic linker 44 Gly Gly Gly Gly Ser Gly Gly GlyGly Ser Gly Gly Gly Gly Ser 1 5 10 15 45 12 DNA Artificial SequenceDescription of Artificial Sequence Synthetic nucleic acid 45 aaaaagcttaag 12 46 35 DNA Artificial Sequence Description of Artificial SequencePrimer 46 tccaaagaaa aaccgaagtg cccaagtgtc ttaag 35 47 277 PRT Herpessimplex virus type 2 47 Leu Leu Arg Val Tyr Ile Asp Gly Pro His Gly ValGly Lys Thr Thr 1 5 10 15 Thr Ser Ala Gln Leu Met Glu Ala Leu Gly ProArg Asp Asn Ile Val 20 25 30 Tyr Val Pro Glu Pro Met Thr Tyr Trp Gln ValLeu Gly Ala Ser Glu 35 40 45 Thr Leu Thr Asn Ile Tyr Asn Thr Gln His ArgLeu Asp Arg Gly Glu 50 55 60 Ile Ser Ala Gly Glu Ala Ala Val Val Met ThrSer Ala Gln Ile Thr 65 70 75 80 Met Ser Thr Pro Tyr Ala Ala Thr Asp AlaVal Leu Ala Pro His Ile 85 90 95 Gly Gly Glu Ala Val Gly Pro Gln Ala ProPro Pro Ala Leu Thr Leu 100 105 110 Val Phe Asp Arg His Pro Ile Ala SerLeu Leu Cys Tyr Pro Ala Ala 115 120 125 Arg Tyr Leu Met Gly Ser Met ThrPro Gln Ala Val Leu Ala Phe Val 130 135 140 Ala Leu Met Pro Pro Thr AlaPro Gly Thr Asn Leu Val Leu Gly Val 145 150 155 160 Leu Pro Glu Ala GluHis Ala Asp Arg Leu Ala Arg Arg Gln Arg Pro 165 170 175 Gly Glu Arg LeuAsp Leu Ala Met Leu Ser Ala Ile Arg Arg Val Tyr 180 185 190 Asp Leu LeuAla Asn Thr Val Arg Tyr Leu Gln Arg Gly Gly Arg Trp 195 200 205 Arg GluAsp Trp Gly Arg Leu Thr Gly Val Ala Ala Ala Thr Pro Arg 210 215 220 ProAsp Pro Glu Asp Gly Ala Gly Ser Leu Pro Arg Ile Glu Asp Thr 225 230 235240 Leu Ala Leu Phe Arg Val Pro Glu Leu Leu Ala Pro Asn Gly Asp Leu 245250 255 Tyr His Ile Phe Ala Trp Val Leu Asp Val Leu Ala Asp Arg Leu Leu260 265 270 Pro Met His Leu Phe 275 48 269 PRT Bovine herpesvirus 1 48Leu Leu Arg Val Tyr Val Asp Gly Pro His Gly Leu Gly Lys Thr Thr 1 5 1015 Ala Ala Ser Arg Leu Ala Ser Glu Arg Gly Asp Ala Ile Tyr Leu Pro 20 2530 Glu Pro Met Ser Tyr Trp Ser Gly Ala Gly Glu Asp Asp Leu Val Ala 35 4045 Arg Val Tyr Thr Ala Gln His Arg Met Asp Arg Gly Glu Ile Asp Ala 50 5560 Arg Glu Ala Ala Gly Val Val Leu Gly Ala Gln Leu Thr Met Ser Thr 65 7075 80 Pro Tyr Val Ala Leu Asn Gly Leu Ile Ala Pro His Ile Gly Glu Glu 8590 95 Pro Ser Pro Gly Asn Ala Thr Pro Pro Asp Leu Ile Leu Ile Phe Asp100 105 110 Arg His Pro Thr Ala Ser Leu Leu Cys Tyr Pro Leu Ala Arg TyrLeu 115 120 125 Thr Arg Cys Leu Pro Ile Glu Ser Val Leu Ser Leu Ile AlaLeu Ile 130 135 140 Pro Pro Thr Pro Pro Gly Thr Asn Leu Ile Leu Gly ThrAla Pro Ala 145 150 155 160 Glu Asp His Leu Ser Arg Leu Val Ala Arg GlyPro Pro Gly Glu Leu 165 170 175 Pro Asp Ala Arg Met Leu Arg Ala Ile ArgTyr Val Tyr Ala Leu Leu 180 185 190 Ala Asn Thr Val Lys Tyr Leu Gln SerGly Gly Ser Trp Arg Ala Asp 195 200 205 Leu Gly Ser Glu Pro Pro Arg LeuPro Leu Ala Pro Pro Glu Ile Gly 210 215 220 Asp Pro Asn Asn Pro Gly GlyHis Asn Thr Leu Leu Ala Leu Ile His 225 230 235 240 Gly Ala Gly Ala ThrArg Gly Cys Ala Ala Met Thr Ser Trp Thr Leu 245 250 255 Asp Leu Leu AlaAsp Arg Leu Arg Ser Met Asn Met Phe 260 265 49 325 PRT Herpes simplexvirus type 2 49 Leu Leu Arg Val Tyr Ile Asp Gly Pro His Gly Val Gly LysThr Thr 1 5 10 15 Thr Ser Ala Gln Leu Met Glu Ala Leu Gly Pro Arg AspAsn Ile Val 20 25 30 Tyr Val Pro Glu Pro Met Thr Tyr Trp Gln Val Leu GlyAla Ser Glu 35 40 45 Thr Leu Thr Asn Ile Tyr Asn Thr Gln His Arg Leu AspArg Gly Glu 50 55 60 Ile Ser Ala Gly Glu Ala Ala Val Val Met Thr Ser AlaGln Ile Thr 65 70 75 80 Met Ser Thr Pro Tyr Ala Ala Thr Asp Ala Val LeuAla Pro His Ile 85 90 95 Gly Gly Glu Ala Val Gly Pro Gln Ala Pro Pro ProAla Leu Thr Leu 100 105 110 Val Phe Asp Arg His Pro Ile Ala Ser Leu LeuCys Tyr Pro Ala Ala 115 120 125 Arg Tyr Leu Met Gly Ser Met Thr Pro GlnAla Val Leu Ala Phe Val 130 135 140 Ala Leu Met Pro Pro Thr Ala Pro GlyThr Asn Leu Val Leu Gly Val 145 150 155 160 Leu Pro Glu Ala Glu His AlaAsp Arg Leu Ala Arg Arg Gln Arg Pro 165 170 175 Gly Glu Arg Leu Asp LeuAla Met Leu Ser Ala Ile Arg Arg Val Tyr 180 185 190 Asp Leu Leu Ala AsnThr Val Arg Tyr Leu Gln Arg Gly Gly Arg Trp 195 200 205 Arg Glu Asp TrpGly Arg Leu Thr Gly Val Ala Ala Ala Thr Pro Arg 210 215 220 Pro Asp ProGlu Asp Gly Ala Gly Ser Leu Pro Arg Ile Glu Asp Thr 225 230 235 240 LeuAla Leu Phe Arg Val Pro Glu Leu Leu Ala Pro Asn Gly Asp Leu 245 250 255Tyr His Ile Phe Ala Trp Val Leu Asp Val Leu Ala Asp Arg Leu Leu 260 265270 Pro Met His Leu Phe Val Leu Asp Tyr Asp Gln Ser Pro Val Gly Cys 275280 285 Arg Asp Ala Leu Leu Arg Leu Thr Ala Gly Met Ile Pro Thr Arg Val290 295 300 Thr Thr Ala Gly Ser Ile Ala Glu Ile Arg Asp Leu Ala Arg ThrPhe 305 310 315 320 Ala Arg Glu Val Gly 325 50 317 PRT Pseudorabiesvirus 50 Ile Leu Arg Ile Tyr Leu Asp Gly Ala Tyr Asp Thr Gly Lys Ser Thr1 5 10 15 Thr Ala Arg Val Met Ala Leu Gly Gly Ala Leu Tyr Val Pro GluPro 20 25 30 Met Ala Tyr Trp Arg Thr Leu Phe Asp Thr Asp Thr Val Ala GlyIle 35 40 45 Tyr Asp Ala Gln Thr Arg Lys Gln Asn Gly Ser Leu Ser Glu GluAsp 50 55 60 Ala Ala Leu Val Thr Ala His Asp Gln Ala Ala Phe Ala Thr ProTyr 65 70 75 80 Leu Leu Leu His Thr Arg Leu Val Pro Leu Phe Gly Pro AlaVal Glu 85 90 95 Gly Pro Pro Glu Met Thr Val Val Phe Asp Arg His Pro ValAla Ala 100 105 110 Thr Val Cys Phe Pro Leu Ala Arg Phe Ile Val Gly AspIle Ser Ala 115 120 125 Ala Ala Phe Val Gly Leu Ala Ala Thr Leu Pro GlyGlu Pro Pro Gly 130 135 140 Gly Asn Leu Val Val Ala Ser Leu Asp Pro AspGlu His Leu Arg Arg 145 150 155 160 Leu Arg Ala Arg Ala Arg Ala Gly GluHis Val Asp Ala Arg Leu Leu 165 170 175 Thr Ala Leu Arg Asn Val Tyr AlaMet Leu Val Asn Thr Ser Arg Tyr 180 185 190 Leu Ser Ser Gly Arg Arg TrpArg Asp Asp Trp Gly Arg Ala Pro Arg 195 200 205 Phe Asp Gln Thr Thr ArgAsp Cys Leu Ala Leu Asn Glu Leu Cys Arg 210 215 220 Pro Arg Asp Asp ProGlu Leu Gln Asp Thr Leu Phe Gly Ala Tyr Lys 225 230 235 240 Ala Pro GluLeu Cys Asp Arg Arg Gly Arg Pro Leu Glu Val His Ala 245 250 255 Trp AlaMet Asp Ala Leu Val Ala Lys Leu Leu Pro Leu Arg Val Ser 260 265 270 ThrVal Asp Leu Gly Pro Ser Pro Arg Val Cys Ala Ala Ala Val Ala 275 280 285Ala Gln Thr Arg Gly Met Glu Val Thr Glu Ser Ala Tyr Gly Asp His 290 295300 Ile Arg Gln Cys Val Cys Ala Phe Thr Ser Glu Met Gly 305 310 315 5164 PRT Artificial Sequence Description of Artificial SequenceIllustrative peptide 51 Phe Phe Leu Leu Ser Ser Ser Ser Tyr Tyr Xaa XaaCys Cys Xaa Trp 1 5 10 15 Leu Leu Leu Leu Pro Pro Pro Pro His His GlnGln Arg Arg Arg Arg 20 25 30 Ile Ile Ile Met Thr Thr Thr Thr Asn Asn LysLys Ser Ser Arg Arg 35 40 45 Val Val Val Val Ala Ala Ala Ala Asp Asp GluGlu Gly Gly Gly Gly 50 55 60 52 64 DNA Artificial Sequence Descriptionof Artificial Sequence Illustrative nucleic acid 52 ttttttttttttttttcccc cccccccccc ccaaaaaaaa aaaaaaaagg gggggggggg 60 gggg 64 53 64DNA Artificial Sequence Description of Artificial Sequence Illustrativenucleic acid 53 ttttccccaa aaggggtttt ccccaaaagg ggttttcccc aaaaggggttttccccaaaa 60 gggg 64 54 64 DNA Artificial Sequence Description ofArtificial Sequence Illustrative nucleic acid 54 tcagtcagtc agtcagtcagtcagtcagtc agtcagtcag tcagtcagtc agtcagtcag 60 tcag 64

We claim:
 1. A method of increasing or decreasing a bioactivity of atarget polypeptide comprising: inserting an intein into the targetpolypeptide, wherein said intein is capable of self-excision; andproviding a signal that agonizes or antagonizes the intein excisionactivity; thereby increasing or decreasing the bioactivity of the targetpolypeptide by agonizing or antagonizing the intein excision activity.2. The method of claim 1, wherein said intein is a conditional mutantintein.
 3. The method of claim 1, wherein said conditional mutant inteinis a temperature-sensitive intein.
 4. The method of claim 3, whereinsaid intein has reduced self-excision activity at temperatures overabout 29° C. relative to its self-excision activity at 18° C.
 5. Themethod of claim 2, wherein said conditional mutant is a cold-sensitivemutant.
 6. The method of claim 5, wherein said intein has reducedself-excision activity at temperatures below about 18° C. relative toits self-excision activity at 30° C.
 7. The method of claim 1, whereinthe signal is selected from the group consisting of changes intemperature, alteration of pH, electromagnetic radiation,phorphorylation or dephosphorylation, glycosylation or deglycosylation,changes in the concentration of an ion, changes in the concentration ofa metal ion, changes in osmotic pressure, and addition or inactivationof a chemical ligand.
 8. The method of claim 7, wherein the change intemperature is an increase in temperature.
 9. The method of claim 7,wherein the change in temperature is a decrease in temperature.
 10. Themethod of claim 7, wherein the chemical ligand is a chemical dimerizer.11. A method of claim 10, wherein the chemical dimerizer is selectedfrom the group consisting of rapamycin, rapamycin analogs, salicyclicacid and abssicic acid.
 12. A method of modulating a bioactivity of atarget polypeptide by agonizing or antagonizing the excision of aregulatable intein inserted into the target polypeptide comprising:providing a regulatable intein, wherein said regulatable intein encodesan intein excision activity that can be agonized or antagonized inresponse to a signal; inserting the intein into the target polypeptidewhich encodes a bioactivity, such that the inserted intein sequencedecreases the bioactivity; and providing a signal that agonizes orantagonizes the intein excision activity; thereby increasing ordecreasing, respectively, the bioactivity of the target polypeptide. 13.The method of claim 12, wherein the regulatable intein is encoded by anucleic acid that hybridizes under stringent conditions to a nucleicacid selected from the group consisting of SEQ ID Nos. 13, 15, 17 or 19.14. The method of claim 12, wherein the regulatable intein is encoded bya nucleic acid which is at least 75% identical to the intein-encodingnucleic acid from any of SEQ ID Nos. 13, 15, 17 or
 19. 15. The method ofclaim 12, wherein the regulatable intein has a polypeptide sequence atleast 75% homologous to the intein polypeptide sequence of any of SEQ IDNos. 14, 16, 18 or
 20. 16. The method of claim 1 or claim 12, whereinthe intein has a polypeptide sequence specified by any of SEQ ID Nos.2-12.
 17. The method of claim 1 or 12, wherein the target polypeptide isGAL4.
 18. The method of claim 17, wherein the GAL4 target polypeptide isencoded by a nucleic acid which hybridizes under stringent conditions tothe nucleic acid of SEQ ID No.
 21. 19. A regulatable intein polypeptidewith an amino acid sequence which comprises at least one of the aminoacid changes found in a conditional intein allele selected from thegroup consisting of TS1, TS4, TS8, TS10, TS15, TS17, TS18, TS19, CS1,CS2 and CS3.
 20. The regulatable intein polypeptide of claim 19 whichhas an amino acid sequence of any of SEQ ID Nos. 2-12.
 21. A mutantintein polypeptide comprising a block C domain mutation wherein thesecond residue of said block C domain is mutated to a nonhydrophobicamino acid residue.
 22. The mutant intein polypeptide of claim 21,wherein the nonhydrophobic amino acid residue is proline.
 23. A mutantintein polypeptide comprising a block E domain mutation wherein theseventh residue of said block E domain is mutated to a nonacidic aminoacid residue.
 24. The mutant intein polypeptide of claim 23, wherein thenonacidic amino acid residue is glycine.
 25. A regulatable intein whichis trans-spliced.
 26. The regulatable intein of claim 25, comprising anamino-terminal intein polypeptide, a linker polypeptide, a dimerizabledomain and a carboxy-terminal intein polypeptide.
 27. The regulatableintein of claim 26, wherein said linker polypeptide is selected from thegroup consisting of Asn-Gly repeats, a polyglycine linker, and Gly-Serrepeats.
 28. An isolated nucleic acid which encodes the regulatableintein of any of claims 19, 20, 21, 22, 23, 24, 25, 26 or
 27. 29. Aregulatable intein polypeptide which is encoded by a nucleic acid thathybridizes under stringent conditions to a nucleic acid selected fromthe group consisting of SEQ ID Nos. 13, 15, 17 and 19, wherein saidintein is a conditional mutant.
 30. The regulatable intein of claim 29,comprising a block EN1 domain mutation wherein the second residue ofsaid block EN1 domain is mutated to a nonhydrophobic amino acid residue.31. The regulatable intein of claim 30, wherein the nonhydrophobic aminoacid residue is proline.
 32. The regulatable intein of claim 29,comprising a block EN3 domain mutation wherein the seventh residue ofsaid block EN3 domain is mutated to a nonacidic amino acid residue. 33.A regulatable intein of claim 32, wherein the nonacidic amino acidresidue is glycine.
 34. A regulatable chimeric polypeptide comprising: atarget polypeptide having a bioactivity; and an intein, which undergoesself-excision, inserted into the target polypeptide, wherein providing asignal that agonizes or antagonizes the intein self-excision activitycauses an increase or decrease, respectively, in the bioactivity of thetarget polypeptide.
 35. A regulatable chimeric polypeptide comprising: atarget polypeptide having a bioactivity; and an intein, which undergoesself-excision, inserted into the target polypeptide, wherein providing asignal that agonizes or antagonizes the intein self-excision activitycauses a decrease or increase, respectively, in the bioactivity of thetarget polypeptide.
 36. A nucleic acid encoding the polypeptide of claim34 or
 35. 37. The nucleic acid of claim 34 or 35 wherein the nucleicacid encoding the regulatable chimeric polypeptide is operably linked toa transcriptional regulatory sequence.
 38. The nucleic acid of claim 37,wherein the transcriptional regulatory sequence regulates geneexpression in mammalian cells.
 39. The nucleic acid of claim 36, whereinthe regulatable chimeric polypeptide is a GAL4:Intein hybridpolypeptide.
 40. The nucleic acid of claim 39, wherein the GAL4:Inteinhybrid polypeptide has the sequence shown in FIG.
 9. 41. A celltransfected with the nucleic of claim
 36. 42. A method for producing aregulatable chimeric polypeptide comprising expressing the nucleic acidof claim 36 in a cell.
 43. An assay for identifying an inteinself-excision agonist or antagonist compound using a chimericpolypeptide comprising a target polypeptide which encodes a bioactivityand an intein polypeptide inserted into the target polypeptidecomprising: contacting the regulatable chimeric polypeptide with a testcompound; and measuring the bioactivity of the target polypeptidewherein a statistically significant increase in the target polypeptidebioactivity in the presence of the test compound, in comparison to thetarget polypeptide bioactivity in the absence of the test compound,indicates that the test compound is an intein self-excision agonistcompound while a statistically significant decrease in the targetpolypeptide bioactivity in the presence of the test compound, incomparison to the target polypeptide bioactivity in the absence of thetest compound, indicates that the test compound is an inteinself-excision antagonist compound.
 44. A nucleic acid cloning vector foruse in creating a regulatable chimeric polypeptide from a targetpolypeptide-encoding nucleic acid sequence comprising: a cloning sitefor an N-Extein-encoding nucleic acid sequence; a regulatableintein-encoding sequence; and a cloning site for a C-Extein-encodingnucleic acid sequence wherein the N-Extein-encoding nucleic acidsequence to be inserted encodes an amino-terminal portion of the targetpolypeptide and the C-Extein-encoding nucleic acid to be insertedencodes a carboxy-terminal portion of the target polpeptide.
 45. Thenucleic of claim 44, which further comprises a transcriptionalregulatory sequence.
 46. A kit comprising the cloning vector of claim44.
 47. The kit of claim 46, further comprising a compound which is anagonist or antagonist of the regulatable intein encoded by theregulatable intein-encoding sequence of the cloning vector.
 48. The kitof claim 46, further comprising at least one additional cloning vectorin which the reading frame between the N-Extein cloning site and theregulatable intein-encoding sequence or between the regulatable inteinand the C-Extein cloning site has been changed by the addition of one ortwo nucleotides or some multiple of one or two nucleotides.
 49. A methodof regulating the level of a target polypeptide comprising: providing atarget polypeptide containing at least one internal cysteine residue;inserting a conditional intein with a self-excision activity into saidtarget polypeptide upstream of the internal cysteine residue to producean unspliced target-intein precursor protein; and providing a signalthat agonizes or antagonizes the intein self-excision activity, therebyincreasing or decreasing the level of the mature spliced targetpolypeptide.
 50. The method of claim 49, wherein the target polypeptideis selected from the group consisting of: Gal4, Gal80 and GFP.